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Abstract 

We provide a wide-ranging study of the scenario where a subset of 
the tables in a relational schema are visible to a user — that is, their 
complete contents are known - while the remaining tables are invisible. 
The schema also has a set of integrity constraints, which may relate the 
visible tables to invisible ones but also may constrain both the visible and 
invisible instances. We want to determine whether information about 
a user query can be inferred using only the visible information and the 
constraints. We consider whether positive information about the query 
can be inferred, and also whether negative information (the query does 
not hold) can be inferred. We further consider both the instance-level 
version of the problem (the visible table extensions are given) and the 
schema-level version, where we want to know whether information can be 
leaked in some instance of the schema. Our instance-level results classify 
the complexity of these problems, both as a function of all inputs, and in 
the size of the instance alone. Our schema-level results exhibit an unusual 
dividing line between decidable and undecidable cases. 


1 Introduction 

There are many applications scenarios where a collection of datasources are de¬ 
fined, but a given user or class of users has access to only a subset of these 
sources. For example, for privacy reasons a data owner may explicitly restrict 
access to a subset of the stored tables, or to virtual tables defined via queries. 
Restricted access can also emerge naturally in data integration, where some 
datasources may be virtual and are defined via mappings to sources. In this 
case, the virtual tables are not accessible (to the middleware) but the backend 
sources are. Many of these scenarios can be subsumed by considering a schema 
consisting of a set of relations related by integrity constraints, with only a sub¬ 
set of the relations accessible. A basic question is whether a given data design 
of this form renders some information inaccessible. Traditional access control 
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mechanisms can restrict explicit access, but they can not prevent “information 
leakage” that may occur due to the presence of semantic relationships either 
between datasources or within one datasource. For example, if there are ref¬ 
erential constraints between relations R and S' in a database, a designer who 
wants to restrict users from accessing the information in R may also have to 
restrict access to S. 

In this work we consider exactly this scenario, where a set of semantically- 
related relations are hidden while for another set the complete contents are 
visible. We will consider semantic relationships specified in a variety of lan¬ 
guages that are rich enough to capture complex relationships between sources, 
including relationships that arise in data integration, as well as common in¬ 
tegrity constraints within a single source, such as referential constraints. The 
basic analysis problem we will consider will be the following: given a schema 
and a (for simplicity, Boolean) query Q, can we infer using data and schema 
information that the result of Q is true or that the result is false. 

Example 1. Consider a medical datasource with relation Appointment(p, a,...) 
containing patient names p, appointment ids a, and other information about the 
appointment, such as the name of the doctor. A dataowner makes available 
one projection of Appointment by creating a relation Patient(p) defined by the 
constraints: 


V p Patient(p) ^ 3a d y Appointment(p, a,d,y) 
y p a d y Appointment(p, a,d,y) ^ Patient(p) . 

The guery Q = 3 ay Appointment( “Smith”, a, “Jones”,y) asking whether patient 
Smith made an appointment with Dr. Jones will be secure under this schema 
in one sense: an external user with access to Patient will never be sure that 
the guery is true. On the other hand, on an instance where the visible relation 
Patient is empty, an external user will know that the guery is false. 

We will say that there is a Negative Query Implication on the visible instance 
where Patient is empty, since a user can determine whether the query is false. 

Our results. We will consider the instance-based problems - given a query 
and instance, can a user determine that the query is either true (a Positive 
Query Implication) or false (Negative Query Implication). We also look at the 
corresponding schema-level problem: given a schema, is there some instance 
where a query implication of one of the above types occurs. 

We start by observing that the instance-level problems, both positive and 
negative, are decidable for a very broad class of constraints. However, when we 
analyze the complexity of the decision problem as the size of the instance in¬ 
creases, we see surprisingly different behavior between the positive and negative 
case. For very simple constraints, such as inclusion dependencies, the negative 
query implication problems are very well-behaved as the instance changes, in 
polynomial time and definable within a well-behaved query language. For the 
same class of constraints, the corresponding positive query implication questions 
are hard even when the schema and query are fixed. 
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When we turn to the schema-level problems, even decidability is not obvious. 
We prove a set of “critical instance” results, showing that whenever there is 
an instance where information about the query can be implied, the “obvious 
instance” works. Thus the schema-level problems reduce to special cases of the 
instance-level problems. Although we use this technique to obtain decidability 
and complexity results both for positive and for negative query implication, the 
classes of constraints to which they apply are different. We give undecidability 
results that show that when the classes are even slightly enlarged, decidability 
of the existence of a schema with a query implication is lost. 

Our techniques. In the process, we introduce a number of tools for use in 
querying of mixtures of complete and incomplete information. 

• Embeddings in rich decidable logics. Our first technique involves 
showing that a large class of instance based problems can be solved by 
translating them into satisfiability problems within a rich fragment of first- 
order logic, the guarded negation fragment, and then analyzing recently- 
developed techniques for analyzing this logic. As we will show, this allows 
to make use of powerful prior decidability results “off-the-shelf”. But 
to get tight complexity bounds, we also require a new analysis of the 
complexity of these logics. 

• Decidability via canonical counterexamples. The schema-level anal¬ 
ysis asks if there is some instance on which information about the query 
can be derived. As mentioned above, we show that whenever there is 
some instance, it can be taken to be the “simplest possible instance”. 
While this idea has been used before to simplify analysis of undecidability 
(e.g. [GM14]), we give a broad result that allows the use of it for decid¬ 
ability. 

• Tractability via Greatest Fixed-point logic. For our instance-level 
problems concerning inference of negative information, we introduce a 
new technique that shows that the problem can be reduced to evaluating a 
query in greatest fixedpoint-Datalog (GFP-Datalog) on the instance. Since 
GFP-Datalog queries can be evaluated in polynomial time, this shows 
tractability in the instance size. This is in contrast to methods used in 
open world query answering based on definability in Datalog, a subset 
of least fixedpoint logic. The reduction to GFP-Datalog requires a new 
analysis of when these inference problems are “active-domain controllable” 
(it suffices to see that the query value is invariant over all hidden databases 
that lie within the active domain of the visible instance). 

• Relationships bet-ween problems. We prove reductions relating the 
positive and negative versions of our problems, relating the schema- and 
instance-level problems, and relating our problems to the widely-studied 
“certain answer problem”. We apply these reductions to get upper and 
lower bounds for our problems. 

In addition to the techniques above, our lower bound results involve a number 
of techniques for coding computation in query inference problems. 

Related Work. Two different communities have studied the problem of de- 
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termining the information that can be inferred from complete access to data in 
a subset of the relations in a relational schema using constraints that relate the 
subset to the full vocabulary. 

In the database community, the focus has been on views. The schema is 
divided into the “base tables” and “view tables”, with the latter being de¬ 
fined by queries (typically conjunctive queries) in terms of the former. Given a 
query over the schema, the basic computational problem is determining which 
answers can be inferred using only the values of the views. Abiteboul and 
Duschka [AD98] isolate the complexity of this problem in the case where views 
are defined by conjunctive queries; in their terminology, it is “querying under 
the Closed World Assumption”, emphasizing the fact that the possible worlds 
revealed by the views are those where the view tables have exactly their visible 
content. In our terminology, this corresponds exactly to the “Positive Query 
Implication” (PQI) problem in the case where the constraints consist entirely 
of conjunctive query view definitions. Chirkova and Yu [CYI4] extend to the 
case where conjunctive query views are supplemented by weakly acyclic depen¬ 
dencies. Another subcase of PQI that has received considerable attention is the 
case where the constraints consist only of “completeness assertions” between 
the invisible and visible portions of the schema. A series of papers by Fan and 
Geerts [FGlOa, FGlOb] isolate the complexity for several variations of the prob¬ 
lem, with particular attention to the case where the completeness assertions are 
via inclusion dependencies from the invisible to the visible part. 

The PQI problem is also related to work on instance-based determinacy (see 
in particular the results of Howe et al. in [KUB^12]) while the “Negative Query 
Implication” (NQI) problem is studied in the view context by Mendelzon and 
Zhang [ZM05], under the name of “conditional emptiness”. In both cases, the 
emphasis has been on view definitions rather than more general constraints 
which may restrict both the visible and invisible instance. In contrast, in our 
work we deal with constraint classes that can restrict the visible and invisible 
data in ways incomparable to view definitions (see also the comparison in Section 

5 ). 

In the description logic community, the emphasis has not been on views, 
but on querying incomplete information with constraints. Our positive query 
implication problems relate to work in the description logic community on hy¬ 
brid closed and open world query answering or DBoxes, in which the schema is 
divided into closed-world and open-world relations. Given a Boolean GQ, we 
want to find out if it holds in all instances that can add facts to the open-world 
relations but do not change the closed-world relations. In the non-Boolean case, 
the generalization is to consider which tuples from the initial instance are in the 
query answer on all such instances. Thus closed-world and open-world relations 
match our notion of visible and invisible, and the hybrid closed and open world 
query answering problem matches our notion of positive query implication, ex¬ 
cept that we restrict to the case where the open-world/visible relations of the 
instance are empty. It is easy to see that this restriction is actually without 
loss of generality: one can reduce the general case to the case we study with a 
simple linear time reduction, making a closed-world copy R' of each open-world 
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relation R, and adding an inclusion dependency from R' to R. As with the 
database community, the main distinction between our study of the Positive 
Query Implication problem and the prior work in the DL community concerns 
the classes of constraints considered. Lutz et al. [LSW12] study the complex¬ 
ity of this problem for the constraint languages £C and DL-LITE, giving a 
dichotomy between CO-NP-hard and first-order rewritable sets of constraints. 
They also show that in all the tractable cases, the problem coincides with the 
classical open-world query answering problem. Franconi et al. [FISH] show 
CO-NP-completeness for a disjunction-free description logic. Our results on the 
data complexity of PQI consider the same problem, but for decidable constraint 
languages that are more expressive, and in particular, can handle relations of 
arbitrary arity, rather than arity at most 2 as in [LSW12, FISH]. 

In summary, both the database and DL communities considered the Positive 
Query Implication questions addressed in this paper, but for constraint classes 
that are different from those we consider. The Negative Query Implication 
problems are not well-studied in the prior literature, and we know of no work 
dealing with the schema-level questions (asking for the existence of an instance 
with a query implication) in prior work. However, in this paper we show (see 
Subsection 4.2) that there is a close relation between the existence questions 
to work concerning conservativity and modularity of constraints of Lutz et al. 
[LW07, KLWW13]. 

Note that our schema-level analysis considers the existence of some instance 
where the query result can be inferred. In contrast, the work of Miklau and 
Suciu [MS07] considers whether a “typical” instance allows such inference. We 
do not deal with probabilistic modelling in this work. 


2 Definitions 

We consider partitioned schemas (or simply, schemas) S = u S„, where the 
partition elements S/i and S„ are finite sets of relation names (or simply, rela¬ 
tions), each with an associated arity. These are the hidden and visible relations, 
respectively. An instance of a schema maps each relation to a set of tuples 
of the associated arity. Instances will be used as inputs to the computational 
problems that are the focus of this work - in this case the instances must be 
finite. Our computational problems also quantify over instances, and they are 
also well-defined when the quantification is over all (finite or infinite) instances. 
For simplicity, by default instances are always finite. However, as we will show, 
taking any of the quantification over all instances will never impact our results, 
and this will allow us to make use of infinite instances freely in our proofs. The 
active domain of an instance is the set of values occurring within the interpre¬ 
tation of some relation in the instance. 

As a suggestive notation, we write V for instances over S„ and T for instances 
over S. Given an instance T for S, its restriction to the S„ relations will be 
referred to as its visible part, denoted Visible(.F). 

We will look at integrity constraints defined by Tuple-generating Dependen- 
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cies (TGDs), which are first-order logic sentences of the form 

Mx (j){x) 3y p{x,y) 

where (j) and p are conjunctions of atoms, which may contain variables and/or 
constants, and where all the universally quantified variables x appear in 
For all the problems considered in this work, one can take w.l.o.g. the right- 
hand side p to consist of a single atom, and we will assume this henceforth. 
We will often omit the universal quantifiers, writing just ->• 3y p(x,y). 

Several classes of TGDs will be of particular interest: 

• Linear TGDs: those where (j) consists of a single atom. 

• Inclusion Dependencies (IDs), linear TGDs where each of (j) and p have 
no constants and no repeated variables. These correspond to traditional 
referential constraints. 

• Many of our results on inclusion dependencies will hold for two more 
general classes. Frontier-guarded TGDs (FGTGDs) [BLMS09] are TGDs 
where one of the conjuncts of (j) is an atom that includes every universally 
quantified variable Xi occurring in p. Gonnected TGDs require only that 
the co-occurrence graph of (j) is connected. The nodes of this graph are the 
variables x, and variables are connected by an edge if they co-occur in an 
atom of 4>. 

Note that every ID is a linear TGD, and every linear TGD is frontier-guarded. 
We will also consider two constraint languages that are generalizations of 
FGTGDs. 

• We allow disjunction, by considering Disjunctive Frontier-guarded TGDs, 
which are of the form 


'ix (t){x) ^ ly V* Pi{x,y) 

where each pi is a conjunction of atoms and there is one atom conjoined 
in (j) that includes every variable Xi included in some pi. 

• Many of our results apply to an even richer constraint language containing 
Disjunctive FGTGDs, the Guarded Negation Fragment, denoted GNFO. 
GNFO is built up inductively according to the grammar: 

(j) ::= R{t) I ti=t2 I 3x (/) I I (p A (p I 

R{i,y) A ^P{y) 

where R is either a relation symbol or the equality relation x = y, and the 
ti represent either variables or constants. Notice that any use of negation 
must occur conjoined with an atomic relation that contains all the free 
variables of the negated formula - such an atomic relation is a guard of 
the formula. In database terms, GNFO is equivalent to relational algebra 
where the difference operator can only be used to subtract query results 
from a relation. The VLDB paper [BtC012] gives both Relational algebra 
and SQL-based syntax for GNFO, and argues that it covers useful queries 
and constraints in practice. 
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For simplicity (so that all of our constraints are well-defined on instances) we will 
always assume that our GNFO formulas are domain-independent; to enforce this 
we can use the relational algebra syntax for capturing these queries, mentioned 
above. The reader only needs to know a few facts about GNFO. The first 
is that it is quite expressive, so in proving things about GNFO constraints 
we immediately get the results for many classes of constraints that we have 
mentioned above. GNFO contains every positive existential formula, is closed 
under Boolean combinations of sentences, and it subsumes disjunctive frontier- 
guarded TGDs up to equivalence. That is, by simply writing out a disjunctive 
frontier-guarded TGD using 3, A, one sees that these are expressible in GNFO. 

Secondly, we will use that GNFO is “tame”, encapsulated in the following 
result from [BtCSll]: 

Theorem 1 ([BtGSll]). Satisfiability for GNFO sentences can be tested effec¬ 
tively, and is 2ExpTiME-complete. Furthermore, every satisfiable sentence has 
a finite satisfying model. 

Note that GNFO does not subsume the constraints corresponding to CQ 
view definitions (e.g. A{x, y)/\B{y, z) V(x, z) cannot be expressed in GNFO). 
However we will cover this special class of constraints in Section 5. 

Finally, we will consider Equality-generating Dependencies (EGDs), of the 
form 

'ix 4>(x) Xi = Xj 

where ^ is a conjunction of atoms and Xi,Xj are variables. EGDs generalize well- 
known relational database constraints, such as functional dependencies and key 
constraints. EGDs with constants further allow equalities between variables and 
constants, e.g. Xi = a, in the right-hand side. 

For our query language we consider conjunctive queries (GQs), first-order 
formulas built up from relational atoms via conjunction and existential quan¬ 
tification (equivalently, relational algebra queries built via selection, projection, 
join, and rename operations), and also unions of GQs (UCQs), which are dis¬ 
junctions (relational algebra UNIONs) of GQs. Boolean UGQs are simply UGQs 
with no free variables. Every GQ Q is associated with a canonical database 
CanonDB((3), where the domain consists of variables and constants of Q and 
the facts are the atoms of Q. 

We will always assume that we have associated with each value a corre¬ 
sponding constant, and we will identify each constant with its value. Thus 
distinct constants will always be forced to denote distinct domain elements - 
this is often called the “unique name assumption” (UNA) [AHV95]. While the 
presence or absence of constants will often make no difference in our results, 
there are several problems where their presence adds significant complications. 
In contrast, it is easy to show that the presence of constants without the UNA 
will never make any difference in any of our results. Note that in our constraint 
and query languages above, with the exception of IDs, constants are allowed by 
default. When we want to restrict to formulas without constants, we add the 
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prefix NoConst - e.g. NoConst FGTGD denotes the frontier-guarded TGDs 
that do not contain constants. 

The crucial definition for our work is the following: 

Definition 2. Let Q be a Boolean UCQ over schema S, C a set of constraints 
over S, and V an instance over a visible schema S„ £ S. 

• PQI(Q,C 7 S, V) = true if for every finite instance T satisfying C, if V = 
Visible(jF) then Q{iF) = true. 

• NQI((5,C,S, V) = true if for every finite instance T satisfying C, if V = 
Visible(J^) then Q{iF) = false. 

We call an S„-instance V realizable w.r.t. C if there is a S-instance T sat¬ 
isfying C such that V = Visible(jr). If an instance V is not realizable w.r.t. C, 
then, trivially, PQI((5, C, S, V) = NQI(Q,C, S, V) = true. In practice, realizable 
instances are the only St,-instances we should ever encounter. For simplicity we 
state our instance-level results for the PQI and NQI problems that take as input 
an arbitrary instance of St,. But since our lower bound arguments will only 
involve realizable instances, an alternative definition that assumes realizable 
inputs yields the same complexity bounds. 

PQI((5,C,S, V) states something about every finite instance, in line with 
our default assumption that instances are finite. We can also talk about an 
“unrestricted version” where the quantification is over every (finite or infinite) 
instance. For the constraints we deal with, there will be no difference between 
these notions. That is, we will show that the finite and unrestricted versions of 
PQI coincide for a given class of arguments Q, C, S, V. We express this by saying 
that “PQI((5,C, S, V) is finitely controllable”, and similarly for NQI. 

Often we will be interested in studying the behavior of these problems when 
Q, S and C are fixed, e.g. looking at the computation time varies in the size of 
V only. We refer to this as the data complexity of the PQI, (resp. NQI) problem. 

The PQI problem contrasts with the usual Open-World Query Answering or 
Certain Answer problem, denoted here 0\NQ{Q,C,iF), which is studied exten¬ 
sively in databases and description logics. The latter problem takes as input 
a Boolean query Q, an instance I, and a set of constraints C, and returns true 
iff the query holds in any finite instance I' containing all facts of I. In PQI 
(and NQI) we further constrain the instance to be fixed on the visible part while 
requiring the invisible part of the input instance to be empty. This is the mix 
of “Glosed World” and “Open World”, and we will see that this Closed World 
restriction can make the complexity significantly higher. 

Example 2. Consider a schema with inclusion dependencies Fi{x) -> 
3y U{x,y) and U{x,y) —*• ^ 2 ( 7 /), where Fi and F 2 are visible but U is 
not. Consider the query Q = 3x U{x,x) and instance consisting only of facts 
Fi{a),F2{a). 

There is a PQI on this instance, since Fi{a) implies that U{a,c) holds for 
some c, but the other constraint and the fact that F 2 must hold only of a means 
that c = a, and hence Q holds. 

In contrast, one can easily see that Q is not certain in the usual sense, where 
Fi and F 2 can be freely extended with additional facts. 



Our schema-level problems concern determining if there is a realizable in¬ 
stance that admits a query implication: 

Definition 3. For Q a Boolean conjunctive query over schema S, and C a set 
of constraints over S, we let: 

• 3PQI((5,C,S) = true if there is a realizable Sy-instance V such that 
PQI((5,C,S, V) =true; 

• 3NQI((5,C,S) = true if there is a realizable Sy-instance V such that 
NQI((5,C,S, V) = true. 

Note that these problems now quantify over instances twice, and hence there 
are alternatives depending on whether the instance V is restricted to be finite, 
and whether the hidden instances J- are restricted to be finite. For a class 
of input Q,C,S, we say that 3PQI((5,C,S) is “finitely controllable” if in both 
quantifications, quantification over finite instances can be freely replaced with 
quantification over arbitrary instances. 


3 Positive Query Implication 


3.1 Instance problems 


We begin with a study of PQI. We will show that PQI is decidable for the 
rich constraint language GNFO, the guarded negation fragment, which includes 
guarded TGDs, disjunctive guarded TGDs, and Boolean combinations with 
Boolean CQs. The key is that we can translate the PQI problem to a satisfiability 
problem for GNFO. 


Theorem 4. The problem PQI((5,C, S, V), as Q ranges over Boolean UCQs 
and C over GNFO constraints, is in 2ExpTime. 

Furthermore, for such constraints the problem is finitely controllable, that is, 
PQI(Q,C,S,V) = true iff for every instance T (of any size) satisfying C, if 
V = Visible(.F), then Q{F) = true. 

Proof. We just note that PQI(Q,C, S, V) translates to unsatisfiability of the 
following formula: 


/PQItoGNF 

VQ,C,S,V 


= A C A 

A ( A Rid) A Va; (i?(a;) -^ \/ x = d)\ 

fi€S„'fi(a)€V R{a)<iV 


If the constraints are in GNFO, then the formula above is also in GNFO. The 
finite controllability of PQI((5,C,S,V) comes from the finite controllability of 
GNFO formulas (Theorem 1) □ 


Above we are using results on satisfiability of GNFO as a “black-box”. Sat¬ 
isfiability tests for GNFO work by translating a satisfiability problem for a 
formula into a tree automaton which must be tested for non-emptiness. By a 
finer analysis of this translation of GNFO formulas to automata, we can see 
that the data complexity of the problem is only singly-exponential. 
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Theorem 5. If Q is a Boolean UCQ and C is a conjunction of GNFO con¬ 
straints over a schema S, then the data complexity o/PQI(Q,C, S, V) (that is, 
as V varies over instances) is in ExpTime. 

Proof. We sketch a proof of this, which involves a finer analysis of the conver¬ 
sion of GNFO formulas to automata. We define a variant of the normal form 
introduced in [BtCSll], called GN-normal form, via the following grammar: 

ip ::= Vi (AjV'y) 

Ip ::= a{x) \ a{x) A (p{x) \ a{x) A -•(p{x) 

where a{x) is an atomic formula. Further, let us say that a GNFO formula (p 
is equality-normalized if 

(i) every occurrence of R{t) in (p appears in conjunction with 

distinct(t) := /\{t = t) A /\ -,(t = A) , 

(ii) whenever equalities are used as guards for negations, then these equality 
guards are of the form x = x, 

(iii) every occurrence of equality in (p is either an equality comparison of con¬ 
stants, or comes from (i) or (ii). 

The width of p, denoted width(0), is the maximum number of free variables 
of any subformula of p. 

Let ^ be a GNFO sentence, with s = |'(/)|. We can construct a (DAG repre¬ 
sentation of an) equi-satisfiable equality-normalized GN-normal form sentence 
p such that: 

• the size of <p is at most 

• width(i^) < s 

where f is a polynomial function independent of p. For p' in GN-normal form, 
we define rankcQ((('^) to be the maximum number of conjuncts pi in any GQ- 
shaped subformula 3x Aipi of p', for non-empty x. 

The construction is not difficult, and a more general statement can be found 
in Proposition 31 of the pre-print available at [BCtCBlS]. 

The following key proposition, shows that formulas in GN-normal form can 
be translated into automata with size controlled by the GQ-rank and width: 

Proposition 6. For every GNFO formula p' in GN-normal form, there is an 
alternating two-way parity automaton T,pi on infinite trees such that T^i rec¬ 
ognizes a non-empty language iff p’ is satisfiable. Moreover, the number of 
states of the automaton, and the running time needed to form it, is bounded 
by /(^O ■ \ where s' = \p'\, w' = width((/)'), r' = rankcQ((/>'): f is a 

polynomial function independent of p'. 
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Proposition 6 is proven by creating an automaton whose state set consists 
of a collection of formulas derived from cj)'. If the formula was guarded, these 
would just be subformulas, but for CQ-shaped subformulas, one will have to 
throw in all subformulas, representing guesses as which of the conjuncts were 
true at a given bag of a tree-like structure. The details are again in the pre¬ 
print at [BCtCBlS], see Corollary 10 (formation of an automaton based on the 
closure) and Lemma 24 (closure size as a function of CQ-rank). 

Now fix a Boolean UCQ Q and a conjunction C of GNFO constraints over 
a schema S. Without loss of generality, we can assume that the constraints in 
C are already in GN-normal form. Consider the formula ia the proof 

of Theorem 4; 

-^Q A C A /\ ( /\ R{a) A \fx (R(x) ^ \/ X = d)^ . 

R'eS„R(s.)iV R(B.)iV ' 

Note that this formula satisfies all the conditions of the GN-normal form but 
those related to equality. Nonetheless, we can further break up the ‘dangerous’ 
subformulas of the form VK(a)6V x = a, grouping based on the repetition pattern 
in a and adding inequalities between x that match the non-repeated positions 
in each group. We can also conjoin each equality with a guard R{x). With 
this linear-time transformation, the conditions for normalizing equalities will be 
satisfied. 

Thus the formula t)e normalized in polynomial time, and the 

width and rankcq of fixed when Q, C, and S are fixed. Applying 

Proposition 6, we get a polynomial-sized two-way alternating automaton. Since 
emptiness of such automata can be checked in ExpTime [Var98], the bound 
claimed in the theorem now follows. □ 

The data complexity bound in Theorem 5 is tight even for inclusion de¬ 
pendencies. The proof, proceeds by showing that a “universal machine” for 
alternating PSpace can be constructed by hxing appropriate Q,C,S in a PQI 
problem. 

Theorem 7. There are a Boolean UCQ Q and a set C of IDs over a schema S 
for which the problem PQI(Q,C,S,V) is ExpTiME-/iard in data complexity. 

Proof. We hrst prove the hardness result using a UCQ Q; later, we show how to 
generalize this to a CQ. We reduce the acceptance problem for an alternating 
PSpace Turing machine M to the negation of PQI((5, C, S, V). 

A configuration of M is defined, as usual, by a control state, a position of the 
head on the tape, and a finite string representing the content on the tape, which 
is assumed to be empty at the beginning. We distinguish between existential 
and universal control states of M. The transition function of M describes a set 
of target configurations on the basis of the current configuration and, without 
loss of generality, we assume that every set of target configurations has cardinal¬ 
ity 2. The computation of M is represented by a tree of configurations, where 
the root represents the initial configuration and where every configuration with 
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an existential control state (resp., a universal control state) has exactly one suc¬ 
cessor configuration (resp., two successor configurations). To make the coding 
simpler we need to adopt a non-standard acceptance condition. Specifically, we 
assume that the Turing machine M never halts, namely, its transition function 
is defined on every configuration, and we distinguish two special control states, 
gacc and Qre]- When the machine reaches one of these two states in a config¬ 
uration, it loops forever without changing the configuration. We say that M 
accepts (the empty input) if for all paths in the computation tree, the state gacc 
is eventually reached; symmetrically, we say that M rejects if there is at least 
one path in the computation tree that contains the state ^rej- Furthermore, we 
assume that the M begin its computation with the head in the second cell and 
never visits the extremal positions of its tape (this can be easily enforced by 
marking the second position of the tape and requiring that whenever the marked 
position is visited, the machine moves to the right). This latter assumption will 
simplify checking that two subsequent configurations are correct with respect 
to the transition rules of M. 

The general idea of the reduction is to create a schema, constraints, and 
query that together represent a “universal machine” for alternating PSpace. 
Given an alternating PSpace machine encoded in the visible instance, an ac¬ 
cepting run is “computed” as an arbitrary full instance satisfying the constraints 
and violating the query — that is, a witness of the failure of PQI. 

We first devise a schema that includes hidden relations that will store the 
computation tree of a generic alternating PSpace machine. The constraints 
and the query will be used to restrict the hidden relations so as to guarantee 
that the encoding of the computation tree is correct. By “generic” we mean 
that the hidden relations and corresponding constraints will be independent of 
the tape size, number of control states, and transition function of the machine. 
The visible instance will store the “representation” of an alternating PSpace 
machine M — that is, an encoding of M that can be calculated efficiently once 
M is known. This will include the tape size and an encoding of the transition 
function. 

We will then give the reduction that takes an alternating polynomial space 
machine M and instantiates all the visible relations with the encoding. The 
space bound on M will allow us to create the tape size components of the 
visible instance efficiently. In contrast, the hidden relations will store aspects of 
a computation that can not be computed easily from M. 

In summary, below we will be describe each part of the schema S for compu¬ 
tation trees of a machine, along with the polynomial mapping that transforms 
a machine M into data filling up the visible parts of the schema. 

First, we encode the tape (devoid of its content) into a binary relation T. T 
will be visible, and can be filled efficiently once an input M is known. Given M, 
it will be filled in the following natural way: it contains all the facts T{y,y'), 
where y is the identifier of a cell and y' is the identifier of the right-successor 
of this cell in the tape. Recall that the input machines M works on a tape of 
polynomial length, and hence the visible instance for the relation T has also 
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size polynomial in M. Despite the fact that the tape length is finite, we need 
every cell to have a successor; this will be exploited later to detect badly formed 
encodings of configurations. We thus include the fact T{y,y) in the instance of 
the visible relation T, where y is the identifier of the rightmost cell of the tape. 
For similar reasons, we add another visible relation Tq, intended to distinguish 
the first two cells in the tape. We will form Tq from M by filling it with the 
identifiers of the first two cells in the tape. 

As for the configurations of the machine, these are described by specifying, 
for each configuration and each tape cell, a suitable value that describes the 
content of that cell, together with the information on whether the Turing ma¬ 
chine has its head on the cell, and what is the corresponding control state. For a 
technical reason (specifically, to allow detecting violations of the transition rules 
between pairs of subsequent configurations), we adjoin to the labelling of a cell 
also that of the adjacent cells whenever the head is within the neighbourhood. 
Formally, the configurations of the machine are encoded by a hidden ternary re¬ 
lation C, where each fact C{x, y, z) indicates that, in the configuration identified 
by X, the cell y has value z. 

We enforce the fact that the cell values range over an appropriate domain 
by a visible unary relation V. As with all of our visible relations, we can fill V 
easily once we have a specific input machine M. In our reduction from machine 
M, we will fill this relation V with (S x E x Q x E) a (S x Q x E) a (S x E x Q) a E, 
where E is the tape alphabet of M and Q is the set of its control states. If a cell 
has value {a,b,q,c), this means that its content is 6, the Turing machine stores 
the control state q, has the head precisely on this cell, and the neighbouring 
cells to the left and to the right have labels a and c, respectively. Similarly, if 
a cell has value {a,q,b) (resp., {b,c,q)), this means that its content is b, the 
Turing machine stores the control state q, and the head is on the left-successor 
(resp., right-successor), which carries the letter a (resp., c). In all other cases, 
we simply store the content b of the cell. 

Recall that the tape of the Turing machine will be encoded in the visible 
relations T and Tq. Because we need to associate the same tape with several 
different configurations, the content of T and Tq will end up being replicated 
within new hidden ternary relations and Tq , where it will be paired with the 
identifier of a configuration. Intuitively, a fact {x,y,y') will indicate that, in 
the configuration x, the cell y precedes the cell y'. Similarly, Tq {x, y, y') will 
indicate that the first two cells of the configuration x are y and y'. Of course, 
we will enforce the condition that the relations and Tq , projected onto the 
last two attributes, are contained in T. 

We now turn to the encoding of the computation tree. For this, we introduce 
a visible unary relation Cq that will contain the identiher of the initial config¬ 
uration. We also introduce the hidden binary relations S'^, S^, and S '2 . We 
recall that every configuration x with an existential control state has exactly 
one successor x' in the computation tree, so we represent this with the fact 
S^{x,x'). Symmetrically, every configuration x with a universal control state 
has exactly two successors xi and X2 in the computation tree, and we represent 
this with the facts Si{x,xi) and S 2 {x,X 2 )- 
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So far, we have introduced the visible relations T, Tq, V, Cg and the hidden 
relations C, T'", Tq , S^, , S 2 ■ These are sufficient to store an encoding of 

the computation tree of the machine. However, the constraint language only 
allows inclusion dependencies, which are not powerful enough to guarantee that 
these relations indeed represent a correct encoding. To overcome this problem, 
we will later introduce a few additional relations and exploit a union of CQs to 
detect the possible violations of the constraints. 

We now list some inclusion dependencies in C that enforce basic constraints 
on the relations. 

• We begin with the constraints on the ordering of the cells in the tape: 

To{y,y') ^ T{y,y') T^{x,y,y') ^ T^{x,y,y') 

T^{x,y,y') ^ T{y,y') T^{x,y,y') -* 3y" {x,y',y") . 

Toix,y,y') ^ Tg{y,y') 

• We proceed by enforcing the constraints on the cell values: 

T^{x,y,y') ^ 3zC{x,y,z) C{x,y,z) ^ V{z) . 

• We finally enforce a tree structure on the configurations assuming that 
the machines starts with an existential state: 


Cg{x) 

3x' 5'^(a;,a;') 


S^(x,x') 

3xi Si (x',xi) 

S^{x, x') 

3yy'T§{x,y, 

^y') 

S^(x, x') 

3X2 S2{x',X2) 

{x,xi) 

X 

LU 

^y') 

(x,xi) 

3x' S^{xi,x') 

S '2 (X,X 2 ) 

I 

LU 

V) 

Sl(x,X2) 

3x' S^{x 2 ,x') 


Next, we explain how to detect badly-formed encodings of the computation tree. 

For this, we use additional visible binary relations Err^^O: Err^, Errc, Err53, Err^v, 
and Errgv, instantiated as follows. 

• The relation Err^^o contains all the pairs mVxV, but the pair (zo,Zi), 
where zg is the cell value (i,i,(7o), Zi is the cell value (i, i, goj J-); do is 
the initial state of the Turing machine M and i is the blank tape symbol. 
Intuitively, Err^^o contains precisely those pairs of values that cannot be 
associated with the first two cells in the initial configuration (recall that 
M starts with the head on the second cell). Similarly, the relation Err^ 
contains all the pairs in V x V but the following ones: the previous pair 
(zg,zi), the pair (^ 1 , 2 : 2 ), where Z 2 = (i,do,J-)) the pair ( 22 , 23 ), where 
23 = i, and the pair ( 23 , 23 ). Namely, the pairs in Err^ are precisely 
those that cannot be associated with any two adjacent cells in the initial 
configuration. Accordingly, we can detect whether the initial configuration 
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is badly-formed using a disjunction of the following CQs: 

Qi,o = y y' z z' Co{x) a TQ{x,y,y') a 
C{x,y,z) A C{x,y',z') a Errx,o(^,^') 

Qi = Ixyzuv Co(x) a T^{x,y,y') a 

C{x,y,z) A C{x,y',z') a Err^( 2 ;, 2 ;') . 

• The relation Errc contains all pairs of cell values from V xV that cannot 
be adjacent, in any configuration (for example, it contains the pair (z, z'), 
where z = (a, b, q, c) and z' = (b', q, c), with b + b'). Accordingly, a violation 
of the adjacency constraint on two consecutive cells of some configuration 
can be detected by the following CQ: 

Qc = '^xyy' zz' T^{x,y,y') a C{x,y,z) a C{x,y',z') a Errc(z,z') . 

• The relation Errgi contains all pairs of cell values from V xV that cannot 
appear on the same position of the tape at an existential configuration 
and its immediate successor (this relation is constructed using the tran¬ 
sition function of the Turing machine). A violation of the corresponding 
constraint can be exposed by the following CQ: 

Qs^ = 3x x' y z z' S^{x,x') a C{x,y,z) a C{x',y,z') a Errg3(z,z') . 

• Similarly, the relation Errgv (resp., Err^v) contains those pairs of values 
(z, z') that cannot appear on the same position of the tape of a universal 
configuration and that of the first (resp., second) successor. The corre¬ 
sponding CQs are 

Qgv = 3x xi y z z' Si{x,xi) a C{x,y,z) a C{xi,y,z) a Errgv{z,z') 
Qgv = 3x X 2 y z z' S '2 (x,X 2 ) a C{x,y,z) a C{x 2 ,y,z') a Err 5 v(z,z') . 

It now remains to check that the Turing machine M reaches the rejecting state 
grej along some path of its computation tree. This can be done by introducing a 
last visible relation Vjej that contains all cell values of the form (a,b,qrej,c), for 
some a,b,c€ E. The CQ that checks this property is 

Qrej = 3x y z C{x,y,z) a V'rej(z) . 

The final query is thus a disjunction of all the above CQs: Q = (5i,o ^ Oi v 
Qc V Qg3 V Qgv V Qgv V Qrej- 

We are now ready to give the reduction. Denote by Vm the instance that 
captures the intended semantics of the visible relations T, Tq, V, Co, Err^^o, 
Err^, Errc, Errsi, Err^v, and Err^v. We have described these semantics above, 
and argued why they can be created in polynomial time. 
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Below, we prove that the Turing machine M has a successful computation 
tree (where all paths visit the control state gacc) iff = false. 

Suppose that M has a successful computation tree p. On the basis of p, 
and by following the intended semantics of the hidden relations C, 

5^, S '2 , we can easily construct a full instance J- that satisfies the constraints 
in C, and which agrees with Vm on the visible part. Furthermore, because we 
correctly encode a successful computation tree of M, the instance T violates 
every CQ of Q, and hence PQI((3,C,S, Vm) = false. 

Conversely, suppose that PQI(Q,C, S, Vm) = false. By Proposition 12, we 
know ChaseSvis(C,S, Vm) contains an S-instance T that violates the UCQ Q. 
By construction, this instance T satisfies the constraints in C and agrees with 
Vm on the visible part. We show that the instance T witnesses the fact that 
M has a successful run. In doing so, we can exploit the fact that T is con¬ 
structed using the chase procedure; in particular, the hidden relations 5^, 
S 2 have a tree-shaped structure, in which every configuration is represented 
by a unique identifier. The identifier ccq of the initial configuration is explic¬ 
itly given in the visible relation Cq. The content of the first cell of this ini¬ 
tial configuration can be easily derived from the series of inclusion dependen¬ 
cies Coix) ^ 3x' S^(x,x'), S^(x,x') ^ 3y,y' T^(x,y,y'), T^(x,y,y') 

T^{x,y,y'), T^{x,y,y') 3z C{x,y,z). Note that the fact that the CQs 
Qi^o and Qi are violated, guarantees that the content of this initial configura¬ 
tion is as expected. Similarly, one can derive the content of the remaining cells 
by inductively applying the constraints T'^{x,y,y') 3y" {x,y',y") and 

T^{x,y,y') -> 3z C{x,y,z), and by recalling that the CQ Qc is violated. As 
for the successor configuration(s), one can discover their identifier(s) using the 
constraints with S^, , S '2 in the right-hand side, and applying similar argu¬ 

ments as before. The fact that the CQs Qga, Qgv, Qgv are violated guarantees 
that the resulting structure of configurations is a correct computation tree of 
M. Finally, because the CQ Qrej is also violated, the computation tree of M 
must be successful. 

We have just shown the ExpTime hardness result for the data complexity of 
the PQI problem, using a UCQ as query. To finish the proof of Theorem 7, we 
show that PQI problems for UCQs can be reduced to the analogous problems 
for CQs. 

Lemma 8. Let Q = [JQi be a Boolean UCQ, let C be a set of constraints 
over a schema S, and let V be an instance for the visible part of S. There 
exist a schema S', a CQ Q' , a set C of constraints, and an S[,-instance V', all 
having linear size with respect to the original objects S, Q, C, and V, such that 
PQI(Q,C,S,V) =true ZjO' PQI(Q',C', S', V') = true. 

Moreover, the transformation preserves all constraint languages considered in 
our results (e.g., inclusion dependencies). 

Proof. The general idea is as follows. For every visible (resp., hidden) relation 
i? of S of arity k, we add to S' a corresponding visible (resp., hidden) relation 
R' of arity k + 1. The idea is that the additional attribute of R' represents a 
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truth value, e.g. 0 or 1, which indicates the presence of a tuple in the original 
relation R. For example, the fact R'{a, 1) indicates the presence of the tuple 
a in the relation R. The constraints and the conjunctive queries in Q will be 
rewritten accordingly, so as to propagate these truth values. Thanks to this, 
we can simulate the disjunctions in the query Q by using conjunctions and an 
appropriate look-up table Or. 

Formally, the new schema S' contains a copy R' of each relation i? in S, 
where R' is visible iff R is visible, plus the visible relations Or, Zero, and One 
of arities 3, 0, and 0, respectively. For each constraint 

R{x) 3y Si{zi)/\.../\ Sm{Zm) 

in C, where zi,... ,Zm are sequences of variables or constants from x,y, we add 
to C a corresponding constraint of the form 

R'{x,b) ly S[{zi,b)/\... A S'^{zm,b) . 

Similarly, every Boolean CQ Qi = 3y S'i(zi) a ... a Sm{zm) of Q is rewritten 
as Q[{h) = 3y S[{z\,b) a ... a S'j^{zm,b). Let n be the number of CQs in Q. 
We define the Boolean CQ 

Q' = 36i...6„,co Cl ... c„ Zero(co) a /\{Q[{bi) AOr{c^-l,b^,c^)) a One(c„) . 

i 

Finally, we construct the visible instance V' as follows. We choose some fresh 
values 0, 1, and 1 that do not belong to the active domain of V. First, we 
include in V' the facts Or(l,l,l), Or(l,0,1), Or(0,1,1), Zero(O), and One(l). 
Then, for each visible relation R of S, we add to V' the fact i?'(i,..., i, 0), as 
well as every fact of the form i?(a, 1), where i?(a) is a fact in V. Note that 
the presence of the facts ..., i, 0) in V' guarantees that the rewritten CQs 
Q[{b) can always be satisfied by letting & = 0 and by extending V' with hidden 
facts of the form 5"(i,..., i, 0). 

We are now ready to prove that PQI((5,C,S, V) = true iff PQI((5',C', S', V') = 
true. Suppose that PQI((5',C',S', V') = true and consider an S-instance T that 
satisfies the constraints in C and such that Visible(J^) = V. Without loss of 
generality, we can assume that the active domain of T does not contain the 
values 0, 1, and i. We can easily transform T into an S'-instance T'^ by simply 
expanding all facts with an additional attributed valued 1 and by adding new 
facts of the form i?'(i,..., i, 0), for all relations R! e S', and new visible facts 
Or(l, 1,1), Or(l,0,1), Or(0,1,1), Zero(O), and One(l). Note that Visible(jr') = 
V'. Since PQI((5',C',S', V') = true, we know that T' satisfies the query Q'. In 
particular, it must satisfy a CQ Qi(b) when b = 1, and this implies that T 
satisfies the UCQ Q = \liQi. Conversely, suppose that PQI((5,C,S,V) = true 
and consider an S'-instance R' that satisfies the constraints in C' and such that 
Visible(.F') = V'. By selecting from T' only the facts of the form i?'(a, 1), with 
R e S, and and by projecting away the last attribute, we obtain an S-instance 
T that satisfies the constraints in C and such that Visible(.F) = V. Finally, since 
PQI((5,C,S, V) = true, we know that T satisfies at least one CQ Qi of Q, and 
hence T' satisfies Q'. □ 
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Applying the lemma above, we have proven Theorem 7. 


□ 


We note that this data complexity lower bound requires a schema with arity 
above 2. It thus contrasts with results of Franconi et al. [FISH] that show 
that the data complexity lies in CO-NP for arity 2. In fact, if we move up from 
IDs to linear TGDs, we can adapt the argument for the above result to show 
ExpTiME-hardness even for arity 2. 

We show that the 2ExpTime combined complexity upper bound is tight 
even for IDs. 

Theorem 9. Checking PQI((5)Cj S, V), where Q ranges over CQs and C over 
sets of inclusion dependencies, is 2ExpTiME-/iard for combined complexity. 

Proof This proof builds up on ideas of the previous proof for Theorem 7. Specif¬ 
ically, we reduce the acceptance problem for an alternating ExpSpace Turing 
machine M to the negation of PQI((5,C,S, V), where Q is a Boolean UCQ and 
C consists of inclusion dependencies. Note that to further reduce the problem 
to a Positive Query Implication problem with a Boolean CQ, one can exploit 
Lemma 8. 

The additional technical difficulty here is to encode a tape of exponential size. 
Of course, this cannot be done succinctly using an instance with visible relations. 
We can however represent the exponential tape by the leaves of a full binary tree. 
More precisely, we fix an alternating ExpSpace Turing machine M of size n 
and we construct a full binary tree of height n, as follows. The root of the tree is 
encoded by a visible unary relation Nq, which is initialized with a single value yo. 
To encode the nodes of the tree at the lower levels, we use a series of hidden unary 
relations Ni,... ,Nn. Similarly, we encode the edges at each level of the binary 
tree using a series of hidden binary relations Eijeft, right, A'n,left) i'n,right- 
The corresponding constraints are easily stated: 


No{y) ^ 

dy' Ei^ieft{y,y') 

^l,left(y,2/0 

- Ni{y') 

Noiy) ^ 

dy' -BprightCy,?/') 

^l,right(y; y ) 

- Ni{y') 

t 

1 

dy' En.\eft{y,y') 

^n,ieft(y; y ) 

- Nn{y') 

t 

1 

dy En^nghtiy^y ) 

-^n,right(y; y ) 

- Nn{y') 


Note that, in a universal instance that satisfies the above constraints (e.g., 
an instance obtained from the chase procedure), every cell of the tape can be 
identified with a unique element of the relation Nn- The paths in the full binary 
tree are naturally ordered lexicographically, and so are the cells. Later, we need 
to access this ordering and, in particular, we need to write a UCQ that checks 
whether two cells are adjacent according to the ordering. For this, we need 
to make the encoding a bit redundant. We first introduce two visible unary 
relations, Dieft and Dright, that are instantiated, respectively, with the singleton 
{left} and the singleton {right}. The values left and right that appear in these 
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sets are used to encode whether a certain node of the binary tree is a left or 
a right successor of its parent. First, this information is collected in a series 
of hidden binary relations F^ijeft, .Di^Hght, • ■ ■, .D„^right (two relations for 

each level of the tree, except for the top level). Then, every pair of relations 
and right is unioned into a new relation Di, which is also hidden. The 
objective is to have a collection of facts of the form Di{y,d), where y is a node, 
i is its level, and d is either left or right depending on whether y is a left or right 
successor of its parent. It is easy to see that this objective is achieved when 
chasing the following constraints: 


^^i,ieft(y) ^ 3d A,ieft(y,d) 


d^n.leftCd) 

Flt,left(y, d) 
I32,left(y, d) 


3d I3n,left(y; d) 

i3|eft(d) 

A(y,d) 


d3l.right(y) 3d right(y, d) 

d^n,right(y) 3d dlyj rightCdr d) 


dli,right(y, d) dlright(d) 

right(y,d) —t D^i^y^d') . 


To explain how we can take advantage of the above redundant encoding, we give 
beforehand the formula that checks whether two cells y and y' are adjacent. The 
idea is to find the first level 0 < i < n in the tree where the access paths to the 
leaves y and y' branches off; after this level, the access path for y must continue 
following the direction right, and the access path for y' must continue following 
the direction left. The formula is the disjunction over alH = 0,..., n - 1 of the 
following CQs: 

Qadj.i(y«,yn) = 3yo...y„_i yo...y(,_i di ...di d d' 

Noiyo) A /\ {Ej{yj_i,yj) ADj{yj,dj)) a 

0<j<i 

Noiy'o) ^ A ^Dj(yj,dj)) A 

0<j<i 

^right(^) ^ /\ (y^-l; ^(y^ : ^ 

i<j<n 

Aeft(d') A /\ (d;j(y'_i,y') AL>^(yj,d')) . 

i<j<n 


It is not difficult to see that the formula correctly defines those pairs of cells 
that are adjacent in the tape, under the usual assumption that the instance is 
generated by chasing the constraints, namely, that the instance is universal. 

Now that we constructed a tape of exponential length and we know how to 
check adjacency of its cells, we proceed as in the proof of Theorem 7. 

We begin by encoding a single configuration of M. Intuitively, this is done 
by creating a copy of the full binary tree and expanding it with the identi¬ 
fier of the configuration and the content of the cells (i.e. the cell values). 
For this, we introduce a series of hidden binary relations Nq , a se¬ 
ries of hidden ternary relations ..., and an addi¬ 

tional hidden ternary relation C. The content for the relations Nq ,..., 
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and -E^ieft:^i:right>----^n.ieft>-S’^right ^iH be obtained by copying the content 
of No,...,Nn, i?i,Ieft,^^i,right, ,^^n.ieft,-En.right and by annotating it with the 
identifier x of the configuration. Formally, this is done through the follows 
constraints: 


^ ^o(y) 


N[^{x,y) 

Ni{y) 

Ei{x,y,y') 

^ Ei{y,y') 

Nnix,y) 

Nn{y) 

En{x,y,y') 

^ E„iy,y') 

N^iy) 

N^(y) 

^ ^y'E^,\e^i{y,y') 

^ V ^^MghtCy.y') 

Exieftiy^y') 

£^yright(2/.2/') 

t t •• 

N^-iiy) 

N^-i(y) 

^ V ^^^rightCy.y') 

EZeftiy^y') 

EZ\ght(y^y') 

- N^iy') 

- N^(y')- 


The ternary relation C is used instead to represent the values of the tape cells. 
Intuitively, a fact of the form C{x,y,z) indicates that, in the configuration 
identified by x, the cell y has value z. As usual (cf. proof of Theorem 7), we 
define cell values as elements from a visible unary relation V = (ExEx(5xE)ta 
(Sx(5xS)i±i(SxSx(5)i+jE, where E is the alphabet of the Turing machine and 
Q is the set of its control states. We recall that if a cell has value (a, 6, q, c), this 
means that its content is b, the control state of M is g, the head is on this cell, 
and the neighbouring cells have labels a and c. Analogous semantics are given 
for the values of the form {a,q,b) (resp., {b,c,q)), which must be associated 
with cells that are immediately to the right (resp., to the left) of the head of 
the Turing machine. We enforce the following constraints over 7V^, C, and V: 

Nnix,y) 3zC{x,y,z) C(x,y,z) V{z) . 

We now turn to the encoding of the computation tree of M. This is almost 
the same as in the proof of Theorem 7. We introduce a visible unary relation 
Co, which contains the identifier of the initial configuration, and three hidden 
binary relations S^, S^, and S 2 ■ A fact of the form S^{x,x') (resp., 

Si (a:,a:i)) represents a transition from an existential (resp., universal) configu¬ 
ration a: to a universal (resp., existential) configuration x' (resp., xi, X 2 )- We 
then enforce the following constraints: 


Co(x) -> 

3x' S^{x,x') 

S^{x,x') - 

^ 3xi Si{x',xi) 

S^{x,x') 

3y' N§{x,y) 

S^{x,x') - 

^ 3X2 S2{x',X2) 

S'({x,xi) 

3y' N§{x,y) 

S^{x,xi) - 

^ 3x' S^{xi,x') 

*S'2(x,X2) ^ 

3y' N^ix,y) 

Sl{x,X2) - 

+ 3x' S^{x 2 ,x') 


Now, we turn to describing how to detect badly formed encodings of the 
computation tree of M. We introduce the visible relations Erri^o, Erri, Errc, 
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Err 53 , Errgv, and Err^v, whose instances are defined exactly as in the proof of 
Theorem 7. 

• Recall that the relation Errj^ q contains all pairs inVxV but (zo,Zi), where 
Z(j is the value (i,i,go) of tfi® first cell of the initial configuration and 
zi = (l, l,qo, l) is the value of the second cell of the initial configuration. 
We can detect whether the first two cells of the initial configuration are 
badly formed using the CQ 

Qi,o = yo . ..yn y'n d d' z z' 

A Tl|eft(d) A Tlright(d ) A Nq (x,yo) ^ 

/\o<i<n(.^i ( 2 /i-l) 2/i) ^d)) A ) A , d ) ) A 

C{x,yn,z) A C{x,y'^,z') a Erri.oC^:, ^') • 

Similarly, the relation Err^ contains those pairs of values that cannot occur 
in two adjacent cells of the initial configuration. We can detect whether 
two adjacent cells of the initial configuration contain wrong values by a 
disjunction of CQs that are built up using the previous formulas Qadj.i, for 
f = 0,..., n - 1. The resulting UCQ is 

Qi = V ^xy y'z z' Co(x) a N^{x,y) a N^(x,y') a Qadj,i(2/, y') a 

0<i<n 

C(x,y,z) A C{x,y',z') a Erri(z,z') . 

• The relation Errc contains those pairs of cell values {z,z') that cannot 
be adjacent, in any conhguration of M. A violation of the adjacency 
constraint can be detected by the following UCQ: 

Qc = V ^xyy'zz' N^{x,y) a N^{x,y') a Qadj.i(y,y') a 

0<i<n 

C{x,y,z) A C{x,y',z') a Errc(z,z') . 

• For the violations that involve values associated with the same position 
of the tape in two subsequent configurations, we use a disjunction of the 
following CQs: 

Qga = 3x x'y z z' S^{x,x') a C{x,y,z) a C{x',y,z') a Err53(z,z') 

Qsv = 3x xi y z z' S^{x,xi) a C{x,y,z) a C(xi,y,z') a Err 5 v(z,z') 

Qgv = 3x X 2 y z z S 2 {x,X 2 ) a C{x,y,z) a C{x 2 ,y,z') a Errgv(z,z') . 

As usual, we can further check whether the Turing machine M reaches the 
rejecting state q^ej along some path of its computation tree. This is done by the 
CQ 

Qrej = 3x y Z C{x,y,z) a UrejC^) • 
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where Vrej is an additional visible relation that contains all cell values of the 
form (a, b, qre],c), for some a,b,c e E. 

Let Q be the disjunction of all the previous CQs and let V be the instance 
that captures the intended semantics of the visible relations Nq, Z?ieft, -Dright, 
V, Errt_o, Err^, Errc, Errga, Err^v, and Err^v. To conclude, we argue along the 
same lines of the proof of Theorem 7 that M has a successful computation tree 
iff PQI(Q,C,S,V) = false. □ 


3.2 Existence problems 

We now turn to the schema-level problem 3PQI. Let V^a} be a fixed instance 
for the visible part of a schema S whose domain contains the single value a 
and whose visible relations are singleton relations of the form {(a,..., a)}. For 
certain constraint languages, we will show that, whenever 3PQI((5,C, S) = true, 
then the witnessing instance can be taken to be V{a} • This can be viewed as an 
extension of the “critical instance” method which has been applied previously 
to chase termination problems: Proposition 3.7 of Marnette and Geerts [MGIO] 
states a related result for disjunctive TGDs in isolation; Gogacz and Marcin- 
cowski [GM14] call such an instance a “well of positivity”. The following shows 
that the technique applies to TGDs and EGDs without constants. 

Theorem 10. For every Boolean UCQ Q and every set C of TGDs and EGDs 
without constants, 3PQI((5,C,S) = true iff PQ\(Q,C,S,Via}) = true. 

We prove the above theorem first for constraints consisting only of TGDs 
without constants; then we will show how to generalize the proof in the 
additional presence of EGDs without constants. First of all, recall that, by 
introducing additional invisible relations, we can assume, without loss of 
generality, that all TGDs have exactly one atom in the right-hand side. 

Next, we introduce a variant of the chase procedure that returns a collection 
of instances (not necessarily finite). As for the classical chase, the procedure re¬ 
ceives as input a relational schema S, some constraints C, and an initial instance 
ipQ for the schema S, which does not need to satisfy the constraints in C. The 
procedure chases the constraints starting from the instance iFo-, guaranteeing at 
the same time that the visible relations of the constructed instances agree with 
Fq. This variant of the chase will be used to prove Theorem 10, as well as other 
results related to the 3NQI problem. 

Formally, the procedure builds a chase tree of instances, starting with the 
singleton tree consisting of the input S-instance Fo and extending the tree by 
repeatedly applying the following steps. It chooses an instance K at some leaf 
of the current tree, a dependency Ri{xi) a ... a Rm{xm) ^ 31/ S{z), where z 
is a sequence of (possibly repeated) variables from xi,... ,Xm,y, and a homo¬ 
morphism / that maps i?i(a:i), ..., Rm{xm) to some facts in K. Then, the 
procedure constructs a new instance from K by adding the fact S{f'{z)), where 
/' is an extension of / that maps, in an injective way, the existentially quantified 
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variables in y to some fresh null values (this can be seen as a classical chase step). 
Immediately after, and only when the relation S is visible, the procedure re¬ 
places the instance K' = Ku{S{f'{z))} with copies of it of the form g{K') such 
that Visible(( 7 (itr')) = Visible(J^o)i for all possible homomorphisms g that map 
the variables z to some values in the active domain {oi,... ,a„} of the visible 
instance Visible(^o) (this can be seen as a chase step for disjunctive EGDs of the 
form S{z) z{i) = oi v ... v z{i) = an)- The resulting instances g{K') are then 
appended as new children of K in the tree-shaped collection. In the special case 
where there are no homomorphisms g such that Visible((/(i4r')) = Visible(jrQ)^ 
we append a “dummy instance” i as a child of K: this is used to represent the 
fact that the chase step from K led to an inconsistency (the dummy node will 
never be extended during the subsequent chase steps). If S is not visible, then 
the instance K' is simply appended as a new child of K. 

This process continues iteratively using a strategy that is “fair”, namely, that 
guarantees that whenever a dependency is applicable in a node on a maximal 
path of the chase tree, then it will be fired at some node (possibly later) on that 
same maximal path (unless the path ends with i). In the limit, the process 
generates a possibly infinite tree-shaped collection of instances. It remains to 
complete the collection with “limits” in order to guarantee that the constraints 
are satisfied. Consider any infinite path Kq, Ki,... in the tree (if there are 
any). It follows from the construction of the chase tree that the instances on 

the path form a chain of homomorphic embeddings Kq —^ Ki .... Such 
chains of homomorphic embeddings admit a natural notion of limit, which we 
denote by lim„sf}isr„. We omit the details of this construction here, which can 
be found, for instance, in [CK90] . The limit instance lim^gN Kn satisfies the 
constraints C. We denote by ChaseSvis(C, S, Jjj) the collection of all non-dummy 
instances that occur at the leaves of the chase tree, plus all limit instances of 
the form limn^N Kn, where Ko,Ki, ... is an infinite path in the chase tree. This 
is well-defined only once the ordering of steps is chosen, but for the results 
below, which order is chosen will not matter, so we abuse notation by referring 
to ChaseSvis(C, S, J^o) as a single object. 

It is clear that every instance in ChaseSvis(C, S, .Fp) satisfies the constraints 
in C and, in addition, agrees with on the visible part of the schema. Below, 
we prove that ChaseSvis(C, S,.Fo) satisfies the following universal property: 

Lemma 11. Let Tq be an instance of a schema S and let T be another in¬ 
stance over the same schema that contains ipQ, agrees with Tq on the visible part 
(i.e. Visible(J^) = Visible(.Fo) ), and satisfies a set C of TGDs without constants. 
Then, there exist an instance K € ChaseSvis(C, S, o-nd a homomorphism from 
K to T. 

Proof. We consider the chase tree for ChaseSvis(C, S, and, based on the full 
instance T, we identify inside this chase tree a suitable path Kq,Ki, ... and a 
corresponding sequence of homomorphisms /ip, hi,... such that, for all n e N, 
maps Kn to T. Once these sequences are defined, the lemma will follow easily 
by letting K = lim„gNitr„ and h = lim„sN/i„, that is, h{a) = & if hn{a) = b for all 
but finitely many n € N. 
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The base step is easy, as we simply let Kq be the initial instance J^q, which 
appears at the root of the chase tree, and let hg be the identity. As for the 
inductive step, suppose that Kn and are defined for some step n, and suppose 
that a ... a Rm{xm) ^ 3y is the dependency that is applied at 

node Km where z is a sequence of variables from xi,... ,Xm,y- Let i?i(/(xi)), 

..., Rm{fixm)) be the facts in the instance Ar„ that have triggered the chase 
step, where / is an homomorphism from the variables in xi,... ,Xm to the 
domain of Kn- Since T satisfies the same dependency and contains the facts 
Ri{hnifixi))), ■.., Rm{hn{f{xm))), it must also contain a fact of the form 
S{h'{f{z))), where f is the extension of / that is the identify on the the 
existentially quantified variables y and h' is some extension of hn that maps the 
variables y to some values in the domain of T. 

Now, to choose the next instance Kn+i, we distinguish two cases, depending 
on whether S is visible or not. If S is not visible, then we know that the 
chase step appends a single instance K' = Kn u {S'(h'(z))} as a child of Kn, 
accordingly, we let Kn+i = K' and hn+i = h' o /'. Otherwise, if S is visible, then 
we observe that h' is a homomorphism from K' = Kn u {^(^'(z))} to In 
particular, h' maps the variables z to some values in the active domain of the 
visible part Visible(.Fo) and hence h'{K') agrees with Tq on the visible part of 
the schema. This implies that the chase step adds at least the instance h'{K') 
as a child of Kn- Accordingly, we can define Kn+i = h'{K') and hn+i = f. 
Given the above constructions, it is easy to see that the homomorphism hn+i 
maps Kn+i to T. 

Proceeding in this way, we either arrive at a leaf, in which case we are 

done, or we obtain an infinite path of the chase tree Kq Ki ..., with 
homomorphisms h[: Ki^ T, such that extends h[, for all z 6 N. It can 

be shown that, then, the limit lim„<:p^ Kn also homomorphically maps to T. □ 

Proposition 12. If Q is a Boolean UCQ, C is a set of TGDs without constants 
over a schema S, and V is a visible instance, then PQI((3,C,S, V) = true iff 
every instance K in ChaseSvis(C, S, V) satisfies Q. 

Proof Suppose that PQI((5, C, S, V) = true and recall that every instance in 
ChaseSvis(C, S, V) satisfies the constraints in C and agrees with V on the visible 
part. In particular, this means that every instance in ChaseSvis(C, S, V) satisfies 
the query Q. 

Conversely, suppose that PQI((3,C,S,V) = false. This means that there is an 
S-instance if that has V as visible part, satisfies the constraints in C, but not the 
query Q. By Lemma 11, letting tpQ = V, we get an instance K 6 ChaseSvis(C, S, V) 
and a homomorphism from K to iF. Since Q is preserved under homomorphisms, 
K does not satisfy Q. □ 

Next, we recall that the visible instance V{a} is constructed over a singleton 
active domain and the constraints C have no constants. This implies that there 
are no disjunctive choices to perform while chasing the constraints starting from 
V{a}. Moreover, it is easy to see that this chase always succeeds. That is, it 
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returns a collection ChaseSvis(C, S, V{o}) with exactly one instance - in partic¬ 
ular, V{a} is a realizable instance. By a slight abuse of notation, we denote by 
chasevis(C, S, V{a}) the unique instance in the collection ChaseSvis(C, S, V{a}). 

Lemma 13. If C is a set of TGDs without constants over a sehema S 
and V is an instance of the visible part of S, then every instance K e 
ChaseSvis(C, S, V) maps homomorphically to chasevis(S,C, V{a}), that is, h{K) £ 
chasevis(S,C,V{a}) for some homomorphism h. 

Proof. Recall that the instances in ChaseSvis(S,C, V) are either leaves or limits 
of infinite paths of the chase tree. Below, we prove that every instance K in 
the chase tree for ChaseSvis(S,C, V) maps to chasevis(S,C, V{a}) via some homo¬ 
morphism h. In addition, we ensure that, if K' is a descendant of K in the 
same chase tree, then the corresponding homomorphism h' is obtained by com¬ 
posing some homomorphism with an extension of h. This way of constructing 
homomorphisms is compatible with limits in the following sense: if ho, hi,... 
are homomorphisms mapping instances Ko,Ki,... along an infinite path of the 
chase tree, then there is a homomorphism lim„(:N that maps the limit instance 
lim„eN Kn to T. 

For the base case of the induction, we consider the initial instance V at 
the root of the chase tree, which clearly maps homomorphically to V^a}- For 
the inductive case, we consider an instance K in the chase tree and suppose 
that it maps to chasevis(S, C,V{a}) via a homomorphism h. We also consider an 
instance K' that is a child of K and is obtained by chasing some dependency 
Ri{xi) A ... A Rjy,{xra) ^ 3^ “FCz), where z is a sequence of variables from 
Xi,... ,Xra,y. This means that there exist two homomorphisms / and g such 
that 

1. / maps the variables xi,..., Xm to some values in K and maps injectively 
the variables y to fresh values; 

2. g either maps f{z) to values in the active domain of V or is the identity 
on f{z), depending on whether S is visible or not; 

3. Rjf(xj)) € K for all 1 < j < m; 

4. K’ = g{Ku{Sifiz))}). 

Note that h maps each fact Rj(^f{xj)) in K to Rj{h{f{xj))') in 
chasevis(S,C,V{a}). Since chasevis(S,C, V{a}) satishes the chased dependency, 
it must also contain a fact of the form S'(h'(/(z))), where h' is a homomor¬ 
phism that extends h on the fresh values f{y). Moreover, if S is visible, then 
h' maps all values f{z) to the same value a, which is the only element of the 
active domain of V{a} • 

We can now define a homomorphism that maps the instance K' = g(^K u 
{5'(/(z))}) to chasevis(S,C, V{a})- If S is not visible, then we recall that g is 
the identity on f{z), and hence h' already maps K' = g(^K u {S{f{z))}) = K u 
{S{f{z))} to chasevis(S,C, V{a}). Otherwise, if S is visible, then we recall that 
g maps f{z) to values in the active domain of V, we let g' be the function that 
maps all values of the active domain of V to a, and hnally we define h" = h' o g'. 
In this a way h" maps K' = g(^K u {<S'(/(z))}) to chasevis(S,C,V{a}). □ 
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Now that we established the key lemmas, we can easily reduce the existence 
problem to an instance-based problem (recall that for the moment we assume 
that the constraints consist only of TGDs): 

Proof of Theorem 10. Clearly, PQI(Q, C, S, V{a}) = true implies 3PQI((5,C,S) = 
true. For the converse direction, suppose that 3PQI((5,C,S) = true. This im¬ 
plies the existence of a realizable instance V such that PQI((5, C, S, V) = true. By 
Proposition 12, every instance in ChaseSvi 5 (S,C, V) satisfies the query Q. More¬ 
over, by Lemma 13, every instance in ChaseSvi 5 (S,C, V) maps homomorphically 
to chasevis(S,C, V(a})- Hence chasevis(S,C, V{o}) also satisfies Q. By applying 
Proposition 12 again, we conclude that PQI(Q, C, S, V{a}) = true. 

Finally, the second statement of the theorem follows from the fact that the 
previous proofs are independent of the assumption that relational instances are 
finite. □ 

Now, we explain how to generalize the proof of Theorem 10 to constraints 
consisting of both TGDs and EGDs (still without constants). This can be done 
by modifying the chase procedure for ChaseSvi 5 (S, C, V) so as to take into account 
also the EGDs in C that can be triggered on the instances that emerge in the 
chase tree. Formally, chasing an EGD of the form i?i(a;i) a ... a Rm{xm) ^ 
X = x', where x,x' are two variables from xi,... ,Xm, amounts at applying a 
suitable homomorphism that identifies the two values h(x) and h{x') whenever 
the facts Ri{h{xi)), ..., Rmih(xm)) belong to the instance under consideration. 
Note that this operation leads to a failure (i.e. a dummy instance) when h(x) 
and h(x') are distinct values from the active domain of the visible part V. 

With this new definition of GhaseSvi 5 (S, C, V) at hand, the proofs of Lemma 
11 and Lemma 13 do not pose particular problems, as one just needs to handle 
the standard case of an EGD dependency. Finally, the proof of Theorem 10 
directly uses Proposition 12 and Lemma 13 as black boxes, and so carries over 
without any modification. 

It is worth remarking that, by pairing Theorem 10 with the upper bound 
and the finite controllability for instance-level problems (Theorem 4), one im¬ 
mediately obtains the following: 

Corollary 14. 3PQI((5,C,S) with Q ranging over Boolean UCQs and C over 
sets of frontier-guarded TGDs without constants, is decidable in 2ExpTime, 
and is finitely controllable. 

In contrast, we show that adding disjunctions or constants to the constraints 
leads to undecidability. We first prove this in the case where the constraints have 
disjunctions. This shows that the interaction of disjunctive linear TGDs and 
linear EGDs (implicit in the visibility assumption) causes the “critical instance” 
reduction to fail. 

Theorem 15. The problem 3PQI((5,C,S) is undecidable as Q ranges over 
Boolean UCQs and C over sets of disjunctive linear TGDs. 
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The proof uses a technique that will be exploited for many of our schema- 
level undecidability arguments. We will reduce the existence of a tiling to the 
3PQI problem. The tiling itself will correspond to the visible instance that 
has a PQI. The invisible relations will store “challenges” to the correctness of 
the tiling. The UCQ Q will have disjuncts that return true exactly when the 
challenge to correctness is passed. There will be challenges to the labelling of 
adjacent cells, challenges to the correctness of the initial tile, and challenges to 
the correct shape of the adjacency relationship - that is, challenges that the 
tiling is really grid-like. A correct tiling corresponds to every challenge being 
passed, and thus corresponds to a visible instance where every extension satisfies 
Q. The undecidability argument also applies to the “unrestricted version” of 
3PQI, in which both quantifications over instances consider arbitrary instances. 
This will also be true for all other undecidability results in this work, which 
always concern the schema-level problems. 

Proof. For simplicity, we deal with the “unrestricted variant” of the problem, 
which asks if there is an arbitrary instance of the visible schema such that every 
superinstance satisfying the constraints also satisfies Q. We comment below on 
how to modify for finite instances. 

We reduce the problem of tiling the infinite grid, which is known to be 
undecidable, to the problem 3PQI. Recall that an instance of the tiling prob¬ 
lem consists of a hnite set T of available tiles, a set of horizontal and vertical 
constraints, given by relations H,V £ T x T, and an initial tile e T for the 
lower-left corner. The problem consists of deciding whether there is a tiling 
function / : N x N ^ T such that 

1. /(0,0)=f„ 

2- (/(bi)./(*+ l.j)) e H for all ij e N, 

3- (/(bj),/(bi + 1)) e ^ for all i,jeN. 

Given an instance {T, H,V,to) of the tiling problem, we show how to con¬ 
struct a schema S, a query Q, and a set of disjunctive IDs over S such that 
3PQI(Q,C,S) = true if and only if there is a tiling function for {T,H,V,to). 

The idea is to enforce suitable constraints and query in such a way that the 
visible instance that witnesses 3PQI represents a candidate tiling, and invisible 
instances represent challenges to the correctness of the tiling. We use attributes 
to denote cells of the grid and we use two visible binary relations Eh and Ey 
to represent the horizontal and vertical edges of the grid. We also introduce a 
unary visible relation Ut for each tile t e T in order to represent a candidate 
tiling function on the grid. 

We begin by enforcing the existence of an initial node with the associated 
tile For this, we introduce another visible relation Init, of arity 0, and linear 
TGD 

Init -»• 3x Ut^{x) . 
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It is also easy to require that every node is connected to at least another node 
in the relation Eh (resp., Ey), and that the latter node has an associated tile 
that satisfies the horizontal constraints H (resp., the vertical constraints V). To 
do so we use the disjunctive linear TGDs 

Ut{x) -* 3y EH{x,y) a ^“(2^) (for all tiles t e T) 

Ut{x) 3z Ev{x,z) A V(t,«)€y (for all tiles t e T) 

We now turn to explain how to enforce a grid structure on the relations Eh and 
Ey, and to guarantee that each node has exactly one tile associated with it. 
Of course, we cannot directly use disjunctive TGDs in order to guarantee that 
Eh and Ey correctly represent the horizontal and vertical edges of the grid. 
However, we can introduce additional hidden relations that make it possible to 
mark certain nodes so as to expose the possible violations. We first show how 
expose violations to the fact that the horizontal edge relation is a function. The 
idea is to select nodes in Eh in order to challenge functionality. Formally, the 
horizontal challenge is captured by a hidden ternary relation HChallenge, by the 
linear TGDs 

Init ^ 3 X y y' HChallenge(a;,y, j/') 

HChallenge(x,i/,y') ^ EH{x,y) a EH{x,y') 
and by the CQ 

Qh = 3 X j/HChallenge(x,7/, j/) . 

Note that if the visible fact Init is present and the relation Eh correctly describes 
the horizontal edges of the grid, then the above query Qh is necessarily satisfied 
by any instance of HChallenge that satisfies the above constraints: the only way 
to give a non-empty instance for HChallengef,j„^t is to use triples of the form 
{x,y,y). Conversely, if the relation Eh is not a function, namely, if there exist 
nodes x, y, y' such that (x, y), (x, y') e Eh and y + y', then the singleton instance 
{{x,y,y')} for the hidden relation HChallengefL,pct will satisfy the associated 
constraint and violate the query Qh- Note that we do not require that the 
relation Eh is injective (this could be still done, but is not necessary for the 
reduction). Similarly, we can use a hidden relation VChallenge and analogous 
constraints and query Qy in order to challenge the functionality of Ey. 

In the same way, we can challenge the confluence of the relations Eh and 
Ey. For this, we introduce a hidden relation CChallenge of arity 5, which is 
associated with the constraints 

Init ^ 3 X y z w w' CChallenge(x, j/, z, w,ru') 
CChallenge(x, y, z,ic,ic') ^ EH{x,y) a Ey{x,z) a Ey{y,w) a Eh{z,w') 
and the CQ 

Qc = 3 X y z w CChallenge(x, t/, z, w,ru) . 

As before, we can argue that there is a positive query implication for Qc iff the 
horizontal and vertical edge relations are confluent, that is, (x,w) e Eh ° Ey 
and (x, w') e Ey o Eh imply w = w'. 
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We need to ensure that there does not exists a node n labeled with two tiles, 
which means that there does not exists two relations Ut and Uf such that n is 
not in Ut{V) and Ut'{V). 

For that we add the two following constraints, where A and B are hidden 
relations 

Init ^ 3xA{x)vB{x) 

B{x) \J {Ut{x) /\Ut’{x)) 

ut' 

Finally, there exists a CQ 

Qa = ^ X A{x) 

Now that we described all the visible and hidden relations of the schema S, 
and the associated constraints C, we define the query for the 3PQI problem as 
the conjunction of the atom Init and all previous UCQs (for this we distribute 
the disjunctions and existential quantifications over the conjunctions): 

Q = Init A Qa a Qh a Qv a Qc a Qt ■ 

It remains to show that 3PQI((5,C,S) = true iff there is a correct tiling of the 
infinite grid, namely, a function / : N x N ^ T that satisfies the conditions 1), 
2), and 3) above. 

Suppose there exists a correct tiling / : N x N ^ T. We construct the 
visible instance V that contains the fact Init and the relations Eh, Ey, and 
Ut with the intended semantics: Eh = {((z,j),(z+ l,j)) | i,j e N}, Ey = 
{((bj))(bJ + 1)) I bi e N}, and Ut = \ = t} for all t e T. Since 

no error can be exposed on the relations Eh, Ey, and Ut, no matter how we 
construct a full instance T that agrees with V on the visible part and satisfies 
the constraints in C, we will have that T satisfies all the components of the 
query other than Qa- In addition, in any such T, B must be empty, since 
otherwise tiling predicates for distinct tiles would overlap, which is not the case. 
Since Init holds, we can conclude via the first constraint above that Qa must 
hold. 

Conversely, suppose that 3PQI((5,C,S) = true and let V be the witnessing 
visible instance. Clearly, V contains the fact Init (otherwise, the query would be 
immediately violated) and it does not contain IF, otherwise the query Qa would 
be violated. We can use the content of V and the knowledge that 3PQI(Q,C, S) = 
true to inductively construct a correct tiling of the infinite grid. More precisely, 
by the first constraint in C, we know that V contains the fact Ut^{x), for some 
node x. Accordingly, we define ix = 0, jx = 0, and f{ix,jx) = t±- For the 
induction step, suppose that f{ix,jx) is defined for a node x with the associated 
coordinates ix and jx- The constraints in C enforce the existence of two cells 
y and z and two tiles t and t' for which the following facts are in the visible 
instance: EH{x,y), Ey{x,z), Ut{y), and Ut'{z)- Accordingly, we let iy = ix + 1, 
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jy = jx, iz = ix, jz = jy + 1, fi^y^y) = t, and f{iz,jz) = t'. By the initial 
constraints, we know that the tiles associated with the new cells {iy,jy) and 
{iz,jz) are consistent with the tile in (ix-,jx) and with the horizontal and vertical 
constraints H and V. We now argue that there is a unique choice for the nodes 
y and z. Indeed, suppose this is not the case; for instance, suppose that there 
exist two distinct nodes y^y' that are connected to x via Eh- Then, we could 
construct a full instance in which the relation HChallenge contains the single 
triple (x,y,y'). This will immediately violate the CQ Qh, and hence Q. Similar 
arguments apply to the vertical successor z. 

We now argue that there are unique choices for the tile t associated with 
a node y. Suppose not. Then we can set A to empty, B to all nodes having 
multiple tiles. The constraints are satisfied, while Qa fails, hence we have 
violated the assumption that we have a PQI. 

Finally, we can argue along the same lines that, during the next steps of the 
induction, the i?y-successor of y and the i?//-successor of 2 coincide. The above 
properties are sufficient to conclude that the constructed function / is a correct 
tiling of the infinite grid. 

The variant for finite instances is done by observing that the same reduction 
produces a periodic grid, which can be represented as a finite instance. □ 

Perhaps even more surprisingly, we show that disjunction can be simulated 
using constants (under UNA). The proof, works by applying the technique of 
“coding Boolean operations and truth values in the schema” which has been 
used to eliminate the need for disjunction in hardness proofs in several past 
works (e.g. [GP03]). 

Proposition 16. There is a polynomial time reduction from 3PQI((3,C, S), 
where Q ranges over Boolean UCQs and C over sets of disjunctive linear TGDs, 
to 3PQI((3',C',S'), where Q' ranges over Boolean UCQs and C over sets of 
linear TGDs (with constants). 

Proof. We transform the schema S to a new schema S' as follows. For every 
visible (resp., hidden) relation i? of S of arity k, we add to S' a corresponding 
visible (resp., hidden) relation R' of arity k + 1. The idea is that the additional 
attribute of R' represents a truth value, i.e. either the constant 0 or the constant 
1, which indicates the presence of a tuple in the original relation R. For example, 
the fact i?'(d, 1) indicates the presence of the tuple d in the relation R. We can 
then simulate the disjunctions in the constraints of C by using conjunctions and 
an appropriate look-up table, which we denote by Or. Formally, we introduce 
three additional relations Or, Check, and Init, of arities 2, 1, and 0, respectively, 
and we let Or and Init be visible and Check be hidden in S'. Then, for every 
disjunctive linear TGD in C of the form 

R{x) ^ 3yS{z)yT{z') 
we add to C' the linear TGD with constants 

R'{x,l) ->■ 31/6i 62 6 i) A T(z', 62 ) A Or( 6 i, 62 ) . 
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We further add to C the following constraints: 

Init ^ Or(0,1) A Or(l,0) A Or(l, 1) 

Init -> 3&1 62 Or(6i, 62) A Check(6i) A Check(62) • 

Finally, we transform every CQ of Q of the form 3y S{y) to a corresponding 
CQ of Q' of the form 


3y S'{y, 1) A Check(l) a Init 

Note that if needed, we can even rewrite the CQ above so as to avoid con¬ 
stants: we introduce another hidden unary relation One and the constraint 
Init -> One(l), and we replace the conjunct Check(l) with 3b Check(6) AOne(&). 
Below, we prove that 3PQI((5, C, S) = true iff 3PQI((5',C', S') = true. 

For the easier direction, we consider a realizable S^-instance V such that 
PQI((3,C,S,V) = true. We can easily transform V into a realizable S(,-instance 
V' that satisfies PQI((5',C',S', V') = true. For this it suffices to copy the content 
of the visible relations of V into V', by properly expanding the tuples with the 
constant 1, and then adding the facts Init, Or(0,1), Or(l,0), and Or(l,l). 

As for the converse direction, we consider a realizable S^-instance V' such 
that PQI((5', C', S', V') = true. By the definition of Q' it is clear that V' contains 
the fact Init, and hence also the facts Or(0,1), Or(l,0), and Or(l,l). Now, 
if we knew that the relation Or contains no other tuples besides (0,1), (1,0), 
and ( 1 ,1) - that is, for every fact Or(6i,&2) in V', we have 61 = 1 or 62 = 1 - 
then we could easily transform V' into a realizable St,-instance V that satisfies 
PQI((5,C,S, V) = true. For this we simply select the facts R'{a, 1) in V', where 
i? is a visible relation of S, and project away the constant 1. 

It remains to show that the relation Or of V' indeed contains no other tuples 
besides (0,1), (1,0), and (1,1). For the sake of contradiction, suppose that 
V' contains a fact of the form Or(&i,& 2 ), with bi i= 1 and &2 1. Since V' is 

realizable, there is a full S'-instance T' such that T' 1 = C and Visible(J^') = V'. 
Note that T' may satisfy Q' and, in particular, the conjunct Check(l). However, 
removing the single fact Check(l) from T' gives a new instance T" that still 
satisfies the constraints in C', agrees with T' on the visible part, and violates 
the query Q'. This contradicts the fact that PQI((5',C',S', V') = true. □ 

From the previous two results we immediately see that the addition of (dis¬ 
tinct) constants leads to undecidability: 

Corollary 17. The problem 3PQI((5,C,S) is undecidable as Q ranges over 
Boolean CQs and C over sets of linear TGDs (with constants). 

We now turn to analysing how the complexity scales with less power¬ 
ful constraints, e.g. linear TGDs without constants. As before, we reduce 
3PQI(Q,C,S) to PQI((5,C,S, V{a})- We can then reuse some ideas from [JK84] 
to solve the latter problem in polynomial space: 
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Theorem 18. The problem PQI((5, C, S, V{a}) as Q ranges over Boolean UCQs 
and C over sets of linear TGDs without constants, is in PSpace, and the same 
is true for 3PQI((5,C, S). 

Proof. By Proposition 12, PQI((5,C,S, V{a}) = true is equivalent to checking 
that there is a homomorphism h from CanonDB((5i) of some CQ Qi of Q to 
the instance chasevis(C, S, V{a})- We can easily guess in NP a CQ Qi of Q, 
some homomorphism h from CanonDB((3i), and the corresponding image I of 
CanonDB((3i) under h. Then, it remains to decide whether I is contained in 
chasevis(C, S, V{a}). Below, we explain how to decide this in polynomial space. 

Recall that the instance chasevis(C, S, V{a}) is obtained as the limit of a 
series of operations that consist of alternatively adding new facts according to 
the TGDs in C and identifying the values that appear in some visible relation 
with the constant a. Note that the second type of operation may also affect 
tuples that belong to hidden relations (this happens when the values are shared 
with facts in the visible instance). Also note that the affected tuples could 
have been inferred during previous steps of the chase. Nonetheless, at the exact 
moment when a new fact R{bi,... ,bk) is inferred by chasing a linear TGD, 
we can detect whether a certain value bi needs to be eventually identified with 
the constant a, and in this case we can safely replace the fact R{bi,...,bk) 
with R{bi,... ,bi_i,a,bi^i,... ,bk). More precisely, to decide whether the i-th 
attribute of R{b) needs to be instantiated with the constant a, we test whether 
C entails a dependency of the form R{x) 3y S(z), where a: is a sequence of 
(possibly repeated) variables that has the same equality type as b (i.e. x(j) = 
x(j') iff b{j) = b{j')), ^ is a visible relation, z is a sequence of variables among 
x,y, and x(i) = z{j) for some 1 < j < \z\. Note that the above entailment 
can be rephrased as a containment problem between two CQs - i.e. R{x) and 
3y S{z) - under a given set of linear TGDs C, and we know from [JK84] that 
the latter problem is in PSpace. We also observe that, in order to discover 
all the values in R(b) that need to be identified with the constant a, it is not 
sufficient to execute the above analysis only once on each position 1 < j < ar(R), 
as identifying some values with the constant a may change the equality type of 
the fact and thus trigger new dependencies from C (notably, this may happen 
when the linear TGDs are not IDs). We thus repeat the above analysis on all 
positions of R and until the corresponding equality type stabilizes - this can be 
still be done in polynomial space. After this, we add the resulting fact to the 
chase. 

What we have just described is an alternative construction of 
chasevis(C,S, V{a}) in which every chase step can be done using a PSpace sub¬ 
procedure. We omit the tedious details showing that this alternative construc¬ 
tion gives the same result, in the limit, as the version of the chase that we 
introduced at the beginning of Section 3.2 (the arguments are similar to the 
proof of Lemma 11). 

Below, we explain how to adapt the techniques from [JK84] to this alterna¬ 
tive variant of the chase, in order to decide whether the homomorphic image I 
of some GQ of Q is contained in chasevis(C, S, V{o}). For this, it is convenient 
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to think of chasevis(C, S, V{o}) as a directed graph, where the nodes represent 
the facts in chasevis(C,S, V{a}) and the edges describe the inference steps that 
derive new facts from existing ones and constraints in C - note that, because the 
constraints are linear TGDs, each inference step depends on at most one fact. 
In particular, the nodes of this graph that have no incoming edge (we call them 
roots) are precisely the facts from the instance V{a}, and all the other nodes are 
reachable from some root. Moreover, by the previous arguments, one can check 
in polynomial space whether an edge exists between two given nodes. 

Now, we focus on the minimal set of edges that connect all the facts of / to 
some roots in the graph. The graph restricted to this set of edges is a forest, 
namely, every node in it has at most one incoming edge. Moreover, the height 
of this forest is at most exponential in |/|, and each level in it contains at most 
|/| nodes. Thus, the restricted graph can be explored by a non-deterministic 
polynomial-space algorithm that guesses the nodes at a level on the basis of 
the nodes at the previous level and the linear TGDs in C. The algorithm ter¬ 
minates successfully once it has visited all the facts in I, witnessing that I is 
contained in chasevis(C,S,V{a}). Otherwise, the computation is rejected after 
seeing exponentially many levels. □ 

We can derive matching lower bounds by reducing Open-World Query An¬ 
swering to 3PQI: 

Proposition 19. For any class of constraints containing linear TGDs, OWQ 
reduces to 3PQI. 

Proof. Let Q he a query, C a set of constraints over a schema S, and T an 
instance of the schema S. We show how to reduce the Open-World Query 
Answering problem for Q, C, S, and tF to a problem 3PQI(Q',C', S'). The idea 
is to create a copy of the instance T in the hidden part of the schema, which 
can be then extended arbitrarily. 

Formally, we let the transformed schema S' consist of all the relations in 

S, which are assumed to be hidden, plus an additional visible relation Good of 
arity 0. We then introduce a variable yh for each value in the active domain of 

T, and we let C contain all the constraints from C, plus a constraint of the form 
Good -> 3y Qj:, where y contains one variable yh for each value b in the active 
domain of T and Qj: is the conjunction of the atoms of the form A(yh.,^,..., yi ,^.), 
for all facts A{bi,... ,bk) in tF. Note that the visible instance Vcood that contains 
the atom Good is realizable, since it can be completed (using the chase) to an 
S'-instance tF' that satisfies the constraints C. Let Q' = Q /\ Good. We claim 
that 3PQI(Q^C', S') = true if and only if Q is certain with respect to C on T. In 
one direction, suppose 3PQI((5',C', S') = true holds. The witness visible instance 
having PQI can only be the instance Vcood- Gonsider an instance T' containing 
all facts of T and satisfying the original constraints. By setting Good to true in 
T', we have an instance satisfying C, and since Vcood has a PQI then we know 
that this instance must satisfy Q' and hence Q. Thus Q is certain with respect 
to C on tF as required. Gonversely, suppose Q is certain with respect to C on IF. 
Letting Cj: be the chase of T with respect to C, we see that Cj^ satisfies Q. We 
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will show there is a PQI for (3',C',S on Vcood- Thus fix an instance T' where 
Good and C holds. The additional constraint implies that T' contains the image 
of T under some homomorphism h. But h extends to a homomorphism of Cjr 
into T'. Thus T' satisfies Q, and therefore satisfies Q'. Thus there is a PQI on 
Vcood as required. 

Thus we have reduced the Open-World Query Answering problem for Q, C, 
and S to the problem ^PQI((5^C', S'). □ 

From this and existing lower bounds on the Open-World Query Answering 
([CFP84] coupled with a reduction from implication to OWQ for linear TGDs, 
[CGK13] for FGTGDs), we see that the prior upper bounds from Theorem 18 
and Corollary 14 are tight: 

Corollary 20. The problem 3PQI((5,C,S), where Q ranges over CQs and C 
over sets of linear TGDs, is PSPACE-Ziard. 

Corollary 21. The problem 3PQI((5,C,S), where Q ranges over CQs and C 
over sets of FGTGDs without constants, is 2ExpTiME-/iarc?. 

3.3 Summary for Positive Query Implication 

The main results on positive query implication are highlighted in the table 
below. 



PQI Data 

PQI Combined 

3PQI 

NoConst 

Linear TGD 

NoConst 
FGTGD 
NoConst Disj. 
Linear TGD 

ExpTiME-cmp 
Thm 5/Thm 7 
ExpTiME-cmp 
Thm 5/Thm 7 
ExpTiME-cmp 
Thm 5/Thm 7 

2ExpTlME-cmp 
Thm 4/Thm 9 
2ExpTlME-cmp 
Thm 4/Thm 9 
2ExpTlME-cmp 
Thm 4/Thm 9 

PSPACE-cmp 
Thm 18/Cor 20 
2 ExpTime 

Cor 14/Cor 21 
undecidable 
Thm 15 

Linear TGD 
& FGTGD 
& GNFO 

ExpTiME-cmp 
Thm 5/Thm 7 

2ExpTlME-cmp 
Thm 4/Thm 9 

undecidable 

Cor 17 


4 Negative Query Implication 

4.1 Instance-level problems 

Here we analyze the complexity of the problem NQI(Q,C,S,V). As in the 
positive case, we begin with an upper bound that holds for a very rich class 
of constraints, which go far beyond referential constraints (and FGTGDs). 

Theorem 22. The problem NQI(Q,C, S, V), as Q ranges over Boolean UCQ 
and C over sets of GNFO constraints, has 2ExpTime combined complexity, 
ExpTime data complexity, and it is finitely controllable. 
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Proof. As in the positive case, we reduce to unsatisfiability of a GNFO formula. 
We use a variation of the same formula: 


-NQItoGNF 

'CQ,C,S,V 


= Q A C A 

/\ ( /\ R{a) A \fx {R{x) -^ \/ X = a)l 

R€sJ^R{a)iV R{a)iV ' 


The data complexity analysis is as in Theorem 5, since the formulas agree on 
the part that varies with the instance. □ 

We can show that this bound is tight if the class of constraints is rich enough. 
This follow from our lower bound for positive query implication problems, since 
we can show that NQI is at least as difficult as PQI for powerful constraints. 

Theorem 23. For any class of constraints that include connected FGTGDs, 
PQI((5,C,S, V) reduces in polynomial time to NQI((5',C', S', V'). When Q,C,S 
are fixed in the input to this reduction, then (5',C',S' are fixed in the output. 
Thus, for these classes of constraints, the lower bounds for combined and data 
complexity given in Theorems 7 and 9 apply to negative query implications as 
well. 


Proof. We first provide a reduction that works with any class of constraints 
allowing arbitrary conjunctions in the left-hand sides (e.g. frontier-guarded 
TGDs). Subsequently, we show how to modify the constructions in order to 
preserve connectedness. 

The schema S' is obtained by copying both the visible and the hidden re¬ 
lations from S and by adding the following relations: a visible relation Error of 
arity 0 and a hidden relation Good of arity 0. The constraints C' will contain 
the same constraints from C, plus one frontier-guarded TGD of the form 

Qi{y) A Good ^ Error 

for each CQ of Q of the form 3y Qi{y). Finally, the query and the visible 
instance for NSB are defined as follows: Q' = Good and V' = V. 

We now verify that PQI((3,C,S,V) = false iff NQI((5',C',S',V') = false. 
Suppose that PQI((5,C,S, V) = false, namely, that there is an S-instance T 
such that T Q, T C, and Visible(J^) = V. Let T' be the S'-instance 
obtained from T by adding the single hidden fact Good. Glearly, T' satis¬ 
fies the query Q' and also the constraints in C'; in particular, it satisfies ev¬ 
ery constraint Qi{y) a Good ^ Error because T violates every disjunct 3y Qi 
of Q. Hence, we have NQI(Q',C',S', V') = false. Conversely, suppose that 
NQI((5', C', S', V') = false, namely, that there is an S'-instance T' such that 
T' 1 = Q' , T' 1 = C', and Visible( J"') = V'. By copying the content of T' for those 
relations belong to the schema S, we obtain an S-instance T that satisfies the 
constraints C. Moreover, because T' contains the fact Good but not the fact 
Error, T' violates every conjunct 3y Qi{y) of Q, and so tF does. This shows 
that PQI((3,C,S,V) = false. 
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We observe that the constraints in the above reduction use left-hand sides 
that are not connected. In order to preserve connectedness, it is sufficient to 
modify the above constructions by adding a dummy variable that is shared 
among all atoms. More precisely, we expand the relations of the schema S and 
the relation Good with a new attribute, and we introduce a new visible relation 
Check of arity 1. The dummy variable will be used to enforce connectedness in 
the left-hand sides, and the relation Check will gather all the values associated 
with the dummy attribute. Using the visible instance, we can also check that the 
relation Check contains exactly one value. The constraints are thus modified as 
follows. Every constraint a ... a Rra{xm) S{z) in C' is transformed 

into Ri(xi,w) A ... /\ Rm{xm,w) S{z,w). In particular, note that the 

constraint Qi{y) a Good ^ Error becomes Qi{y, w) a Good(?ii) ^ Error(w), which 
is now a connected frontier-guarded TGD. Furthermore, for every relation R(x) 
in S, we add the constraint 


R{x,w) -s- Check(ic) 
and we do the same for the relation Good: 

Good(r(;) ^ Check('u;) . 

Finally, the query is transformed into Q' = 3w Good(rc) and the visible instance 
V' is expanded with a fresh dummy value a on the additional attribute and with 
the visible fact Check(a). □ 

As mentioned in the body, from the above reduction and from Theorems 7 
and 9, we get the following hardness results for instance-based NSB. 

Corollary 24. There are a Boolean UCQ Q and a set C of IDs over a schema 
S for which the problem NQI((5,C,S, V) is ExpTiME-/iard in data complexity 
(that is, as V varies over instances). 

Corollary 25. The problem NQI((5, C, S, V), as C ranges over sets of connected 
frontier-guarded TGDs, S over schemas, Q over conjunctive gueries and V over 
instances, is 2ExpTiME-/iarc?. 


Thus far, the negative query implication results have been similar to the 
positive ones. We will now show a strong contrast in the case of IDs and linear 
TGDs. Recall that the PQI problems were highly intractable even with for fixed 
schema, query, and constraints. We begin by showing that NQI((5, C, S, V) can 
be solved easily by looking only at full instances that agree with V on the visible 
part and whose active domains are the same as that of V: 

Definition 26. The problem NQI((5, C, S, V) is said to be active domain con¬ 
trollable if it is equivalent to asking that for every instance T over the active 
domain ofV, if T satisfies C and V = Visible(J^), then Q{T) = false. 
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It is clear that the the problem NQI((5,C,S,V) is simpler when it is active 
domain controllable, as in this case we could guess a full instance T over the 
active domain of V and then reduce the problem to checking whether Q holds 
on T. 

We give a simple argument that NQI under IDs is active domain controllable. 
Let C be a set of IDs over a schema S, Q be a UCQ, and V be a visible instance 
such that NQI(Q,C, S, V) = false. This means that there is a full instance T 
such that iF 1= C, Visible(iF) = V, and T \= Q. Now take any value a e adom(V) 
(i.e. in the active domain of V) and let h be the homomorphism that is the 
identity over adom(V) and maps any other value from adom(J^) \ adom(V) to 
a. Since, the constraints C are IDs (in particular, since the left-hand side atoms 
do not have repeated occurrences of the same variable), we know that h{J) 1= C. 
Similarly, we have h(J) 1= Q. Hence, h{J) is an instance over the active domain 
of V that equally witnesses NQI((5, C, S, V) = false. 

The following example shows that linear TGDs are not always active domain 
controllable. 

Example 3. Let S be the schema with a hidden relation R of arity 2, with two 
visible relations S,T of arities 1, 0, respectively, and with the constraints: 

R{x,y) S{x) R{x,x) T . 

Note that the constraints are linear TGDs and they are even full - no existential 
quantifiers on the right. The conjunctive query is Q = 3x y R{x,y). Further 
let the visible instance V consists of the single fact S{a). Clearly, every full 
instance T over the active domain {a} that satisfies both C and Q must also 
contain the facts R{a,a) and T, and so such an instance cannot agree with V 
in the visible part. On the other hand, the instance that contains the facts S{a) 
and R{a, b), for a fresh value b, satisfies both C and Q and moreover agrees with 
V. This shows that NQI((5,C,S, V) is not active domain controllable. 

Despite the above example, we show that we can still transform any schema 
with linear TGDs (which may include constants) and any query for an NQI 
problem so as to enforce active domain controllability. Furthermore, we can do 
so while preserving the visible instance of the problem: 

Theorem 27. Given a schema S, a set C of linear TGDs (possibly including 
constants), and a UCQ Q, one can construct in exponential time a new schema 
S', a set C' of linear TGDs (with constants), and a UCQ Q' such that S„ = S(, 
and, for all instances V over S„ ; 

1 . NQI(Q,C,S,V) = NQI(g',C',S',V), 

2. NQI((5',C', S', V) is active domain controllable, 

3. the number of constraints of C is exponential in that of C, but each con¬ 
straint of C is polynomial in the maximum size of the constraints of C, 
and similarly for the number and size of CQs in Q and Q'. 

Proof. The main idea of the transformation is to select those attributes in a 
relation of S that are going to be instantiated with values from the active do¬ 
main adorn(V) of V. Since we do not know in advance which attributes need to 
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be selected (this depends in particular on the instance V), we introduce several 
copies of the hidden relations of S, one for each possible choice of set of selected 
attributes. Moreover, in order to be able to correctly reconstruct some wit¬ 
nessing instances, we need to recall the equality relationships enforced on those 
attributes that are not selected. This is necessary since there might exist lin¬ 
ear TGDs with repeated variables in the left-hand side that are activated only 
when the corresponding attributes carry the same value. For similar reasons 
(e.g., presence of constants in the left-hand side of a TGD), we must also re¬ 
call the equalities enforced between unselected attributes and constants outside 
adom(V). 

Formally, we define an equality pattern over a set I of attributes as a sequence 
of (possibly repeated) variables and constants indexed by I. Like the atoms of 
a TGD, equality patterns can be used to recall equality relationships between 
attributes and constants, but not inequalities. We generically denote equality 
patterns by u,v, etc. Moreover, we compare equality patterns up to variable 
renaming, that is, we write u « u whenever u = h(v) holds for some injective 
function h from variables to variables. Similarly, we write v < v whenever 
u = h{v) for some (possibly non-injective) function from variables to variables 
and constants. For example, u < v for u = xycxy and v = xyzwy. For each 
Ri-equivalence class, we fix, once and for all, an equality pattern that acts as a 
representative of the class. 

The transformed schema S' contains all the visible relations of the original 
schema S, plus one relation of arity |/| for each hidden relation R in S, each 

set I £ {1,.. ., ar(i?)}, and each representative u of an Ri-equivalence class over 
/={!,..., ar(i?)} \ I. Even if we do not enforce it explicitly, the attributes that 
are selected in a copy Ri^u of R are meant to contain only values ranging over 
the active domain of V. For convenience, we also denote the visible relations 
in the transformed schema S' by Ri^u, where I is assumed to be the full set of 
attributes {1 ,... ,ar(i?)} and u is the trivial equality pattern over the empty 
set. 

Accordingly, every constraint in C involving some relations R and S is trans¬ 
lated to several analogous constraints in C' involving relations of the form Ri^u 
and Sj^v- More precisely, we consider every linear TGD in C of the form 

R{x) ^ 3y S{z) 

where a; is a sequence of variables and constants, y is a sequence of variables, 
and z is a sequence of variables and constants from x and y. We also consider 
every possible pair of relations and S'yc in S'. Then, for each of the above 
choices, we add to C the linear TGD 


Ri,u{x\I) Sj^y{z\J) 

provided that the following conditions hold: 

1. / contains at least the positions of the left-hand side R{x) that are asso¬ 

ciated with constants from adorn(V), 
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2 . 


J contains at least the positions of the left-hand side S{z) that are asso¬ 
ciated with constants from adorn(V), 

3. if x{i) = z{j), then z € / iff j e J, 

4. a;|/< u, where/ = {1,ar(i?)} \ J, 

5. z\J < V, where J = {1,..., ar(S')} \ J 

(of course, if R is visible then there is a single choice for / and u, and similarly 
for J and v when S is visible). 

The transformation of the query Q is similar: for all CQs 

3y S'i(zi) A ... A S„{Zn) 

in Q and for all sequences of relations ..., Sn,j„,v„ in S' such that 

zi|Ji < hi, ..., Zn\Jn i Vn, where Ji = {1,..., ar(S'i)} \ Ji, ..., Jn = 
{!,..., ar(S'„)} \ Jn, we add as a disjunct of Q' the CQ 

3y (zi|di) A . . . A Sn,Jn ,Vn • 

It remains to prove that for all visible instances V, NQI((3,C, S, V) = 
NQI(Q^C',S', V), and that the latter problem is active domain controllable. 

For the easier direction, suppose that NQI((5,C,S, V) = false, namely, that 
there is an S-instance T such that T C, T Q, and Visible(iF) = V. We define 
the corresponding S'-instance T' as follows: for each relation i? in S and each 
copy of it in S', we instantiate Ri^u with the set of tuples of the form a\I, 
where a e R, a\I < u, and I = ar(i?)} \ I. The instance T' constructed 

in this way satisfies both the query Q' and the constraints in C. Moreover, T' 
ranges over the active domain of V and satisfies Visible(lF') = Visible(iF) = V. 

Conversely, suppose that NQI(Q',C', S', V) = false, namely, that there is an 
S'-instance T' such that T' 1 = C', T' 1 = Q', and Visible(.F') = V. We need to give 
an instance for every relation i? in S so as to witnesses NQI((3,C, S, V) = false. 
For the visible relations, we simply copy their content from the instance T'. 
For the hidden relations R, the construction is more complicated, as it requires 
to merge the contents of the different copies Ri^u in S'. Formally, we fix an 
extension ID) of the active domain of V that contains k additional fresh values, 
where k is the maximum arity of the relations of S. Then, we consider a fact 
Ri,uiH) in the instance R' and a candidate tuple b € For each of such 

choices, we add the fact R(b) to R, provided that / contains exactly those 
positions of b that are associated with values in the active domain of V, b\I = a, 
and b\I « u, where I = {1,..., ar(i?)} \ I. For example, if R has arity 5, / = 
{ 1 , 2 }, u = xyzcz, for some constant c ^ adom(V), and Rj^a contains the tuple 
a = ( 01 , 02 ), then we add to R all the tuples of the form b = ( 01 , 02 , 03 , 0 , 03 ), 
where 03 ranges over D \ (adom(V) u {c}). The instance R constructed from 
T' clearly agrees with T' on the visible part. 

Below we show that T satisfies the constraints C and the query Q. Consider 
any linear TGD of C of the form 

R{x) 3y S{z) 
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and any fact R{a) that is the image under some homomorphism h of the left- 
hand side atom R{x). Let I be the set of positions z 6 {1,..., ar(i?)} such that 
a{i) € adom(V) and let u = a\I, where / = {1,..., ar(i?)} \ I. By the previous 
constructions, we know that -R/,a(a|/) is a fact of the instance T'. Moreover, 
T' verifies a linear TGD of the form 


Ri,u{x\I) 3y Sj^y{z\J) 

where J and v can be chosen arbitrarily, provided that they satisfy the conditions 
1) - 5) above. We derive the existence of a fact in T' that is of the form 5'yg(&), 
where 6(j) = a{i) whenever z{j) = x{i). Again, by the previous constructions, 
we know that the tuple b can be extended with values in D so as to obtain a 
new tuple c such that c < v that is the image of the right-hand side atom S{z) 
under some extension of the homomorphism h. This proves that T satisfies the 
linear TGD R{x^ 3y S(z). 

Using similar arguments, one can show that 1= Q, and hence 
NQI((3,C,S, V) = NQI((3',C', S', V). To conclude the proof, we observe that 
the active domain controllability of NQI((5',C',S', V) follows from the two-way 
correspondence between the S-instances T and the S'-instances T' and from the 
fact that the latter instances T' are directly constructed over the active domain 
of V. □ 

Now we show how to exploit active domain controllability to prove that NQI 
problems can be solved not only efficiently, but “definably” using well-behaved 
query languages. For this, we introduce a variant of Datalog programs, called 
GFP-Datalog programs, whose semantics is given by greatest fixpoints. GFP- 
Datalog programs are defined syntactically in the same way as Datalog programs 
[AHV95], that is, as finite sets of rules of the form U{x) <- Q{x) where the 
Xi are implicitly universally quantified and Q is a conjunctive query whose free 
variables are exactly x. As for Datalog programs, we distinguish between exten- 
sional (i.e., input) predicates and intensional (i.e., output) predicates. In the 
above rules we restrict the left-hand sides to contain only intensional predicates. 
Given a GFP-Datalog program P, the immediate eonsequence operator for P is 
the function that, given a database instance M consisting of both extensional 
and intensional relations, returns the database instance M' where the exten¬ 
sional relations are as in M and the tuples of each intensional relation U are 
those satisfying Q{M), where Q is any query appearing on the right of a rule 
with U. The immediate consequence operator is monotone, and the semantics 
of the GFP-Datalog program on extensional database instance I is defined as 
the greatest fixpoint of this operator starting at the database instance that 
extends I by setting each intensional relation “maximally” — that is, to the 
tuples of values from the active domain of I plus the constants appearing in the 
GFP-Datalog program. A program may also include a distinguished intensional 
predicate, the goal predicate G, and then the result is taken to be the projection 
of the greatest fixpoint onto G. 
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Theorem 28. If Q is a UCQ, C a set of linear TGDs (with constants), and 
NQI((3,C, S, V) is active domain controllable, then -■NQI(Q,C,S, V), viewed as 
a Boolean query over the visible part V, is definable by a GFP-Datalog program 
that can be constructed in PTime from Q, C, and S. 

Proof. We need to describe by means of a GFP-Datalog program the function 
-iNQI((5,C, S, V) that maps an instance V to either true or false depending on 
whether or not Q holds over some instance T that satisfies the constraints C 
and such that Visible(JF) = V. Thanks to active domain controllability, it is 
sufficient consider only full instances constructed over the active domain of V. 
More precisely, it is sufficient to show that a witnessing instance T can be 
constructed as a greatest fixpoint starting from the values in the active domain 
of V. 

Below, we provide the GFP-Datalog program that computes T starting from 
V. The extensional relations are the ones in the visible part V, while the inten- 
sional relations are the ones in the hidden part of the schema S, plus an extra 
intensional relation A that collects the values in the active domain of V. For 
each extensional relation i? and each position i e {1,..., ar(i?)}, we add the rule 
A(xi) <- R{x), which derives all elements from the active domain into A. In 
addition, for each intensional relation R, we have the rule 

^ AM^i) ^ A ■ 

i linear TGD in C of the 

form R{x) -*■ 3y S{z) 

Let T be the instance consisting of the visible part V and the intensional 
relations R computed by the above Datalog program under the greatest fix- 
point semantics. We claim that P satisfies the constraints in C. Indeed, if 
R{x) 3y S{z) is a linear TGD in C and R{a) is a fact of T, with R{d) image 
of R(x) via some homomorphism h, then P contains a fact of the form S(h), 
where b is the image of S{z) via some homomorphism h' that extends h. 

To conclude, in order to compute the Boolean query -.NQI((5,C,S, V) start¬ 
ing from V, we simply add to the above GFP-Datalog program one rule 
Goal <- S'i( 2 i) A ... A Sn(zn) for each GQ 3y Si(zi) a ... a Sn(zn) of Q, 
and take Goal to be the final output of our program. □ 

Now, recall that the naive fixpoint algorithm for a GFP-Datalog program 
takes exponential time in the maximum arity of the intensional relations, but 
only polynomial time in the size of the extensional relations and the number of 
rules. Thus, from Theorems 27 and 28, we immediately get: 

Corollary 29. If C is restricted to range over sets of linear TGDs, then 
NQI((3,C, S, V) has data complexity in PTime and combined complexity in 
ExpTime. 

Example 4. Returning to the medical example from the introduction. Exam¬ 
ple 1, we see that the GFP-Datalog program is quite intuitive: since Patient is 
empty in the instance and we have a referential constraint from Appointment 
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into Patient, Appointment is removed as well, leaving the empty instance. The 
program then simply evaluates the query on the resulting instance, which returns 
false, indicating that an NQI does hold on the original instance. 

We do not know whether the use of GFP-Datalog can be replaced by other 
logics, such as Datalog. However we can show that in order to logically define 
-iNQI((5,C,S, V) from a given visible instance V, it is necessary to go beyond 
first-order queries: 

Proposition 30. There are CQs Q and sets of IDs C such that NQI((3,C, S, V) 
can not be described by a first-order query over V. More generally, there are 
CQs Q and sets of IDs C such that NQI((5,C,S,C, V) is PTiME-hard in data 
complexity (that is, as V varies over instances). 

Proof. To prove that NQI((3,C, S, V) is not first-order definable from V we give 
a reduction from a graph reachability problem. The rough idea is to let some 
visible relations represent an input graph with two distinguished nodes playing 
the role of a source and a target. A proof for the existence of a path from the 
source to the target can be then exposed in the hidden relations. Formally, the 
nodes and the edges of the graph are encoded by two visible relations N and 
E of arity 1 and 2, respectively. The source and target nodes are encoded by 
two singleton relations A and B, respectively, of arity 1. A visible relation P of 
arity 5 is also introduced in order to drive the induction principle underlying the 
proof of existence of paths between pairs of nodes. This relation will contain the 
basic proof steps that can be used to witness reachability between two nodes. 
Formally, P contains tuples of the form {x,y,i, z,j), where x,y,z are nodes in 
N and 0<i,j < |A^|, such that: 

1 . either {y,z) is an edge and j = z -i- 1 , meaning that if x is connected to y 
by a path of length z, then and (y, z) is an edge, then x is connected to z 
by a path of length j = i-hi, 

2 . or X = y = z and z = j = 0 , meaning that every node x is connected to itself 
by a path of length 0. 

We fix V to be our visible instance, which contains the relations N, E, and P. 
In addition, we introduce a hidden relation T of arity 3, that will be constrained 
so as to contain only those triples {x,z,j) for which one can witness, using the 
basic proof steps in P, that x is connected to z by a path of length j. For this 
it suffices to enforce the following ID: 

T{x,z,j) ^ 3y i T{x,y,i) AP{x,y,i,z,j) . 

It is easy to see that, for every full instance T that satisfies the above constraint 
C and agrees with V in its visible part, if {x,z,j) is a tuple in T, then there is 
a path from x to z of length j. Conversely, if a node x is connected to a node y 
by a path of length j, then there is a way to extend the visible instance V with 
a relation T that satisfies C and contains the tuple (x,z,j). Thus, if we let Q 
be the CQ 3x z j A{x) a B{z) a T{x, z,j), then we have NQI((5,C, S, V) = true 
iff the A-labelled node is connected to the H-labelled node. This property is 
clearly not definable in first-order logic. 
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A similar technique can be used to prove that NQI(Q,C,S,C, V) is PTime- 
hard for data complexity. The idea is to reduce the problem of evaluating 
a Boolean circuit to NQI((5,C,S,C,V). The input and the structure of the 
Boolean circuit can be easily encoded in some visible relations. In addition, one 
introduces a visible relation P that contains all the valid rules that can be used 
during an evaluation. Finally, a hidden relation T can be used to expose a proof 
that the Boolean circuit evaluates to true. □ 

We give a tight ExpTime lower bound for the combined complexity of NQI 
with linear TGDs: 

Theorem 31. The combined complexity 0 /NQI(Q, C, S, V), where C ranges over 
IDs, is PxpTlME-hard. 

Proof. We reduce the acceptance problem for an alternating P Space Turing 
machine M to NQI((5,C,S, V). As in the proof of Theorem 7, we assume that 
the transition function of M maps each configuration to a set of exactly 2 
successor configurations. In particular, M never halts. We also assume that 
M begins its computation with the head on the second position and never 
visits the first and last position of the tape. The acceptance condition of M 
is defined by distinguishing two special control states, qacc and ^rej, that once 
reached will ‘freeze’ M in its current configuration. We say that M accepts (the 
empty input) if for all paths in the computation tree, the state (?acc is eventually 
reached; otherwise, we say that M rejects. 

Differently from the proofs of Theorem 7 and Theorem 9, the configurations 
of M can be described by simply specifying the label of each cell of the tape, 
the position of the head, and the control state of the Turing machine M. We 
thus define cell values as elements of P = (E x Q) tti E, where E is the alphabet 
of M and Q is the set of its control states. If a cell has value {a,q), this means 
that the associated letter is a, the control state of M is q, and the head is on 
this cell. Otherwise, if a cell has value a, this means that the associated letter 
is a and the head of M is not on this cell. 

Now, let n be the size of the tape of M. We begin by describing the initial 
configuration of M. This is encoded by a visible relation Cq of arity n + 1, 
where the first attribute gives the identifier of the initial configuration and the 
remaining n attributes give the values of the tape cells. As the relation Co is 
visible, we can immediately fix its content to be a singleton consisting of the 
tuple (xo, 2/1, ?/2,2/3, ■ ■ ■ ,yn), where xq is the identifier of the initial configuration, 
2/1 = 2/2 = (J-; 9 o); 2/3 = • ■ • = 2 /n = J-■ As for the other configurations of M, we 

store them into two distinct hidden relations C^ and C '^, depending on whether 
the control states are existential or universal. Each fact in one of these two 
relation consists of n+1 attributes, where the first attribute specifies an identifier 
and the remaining n attributes specify the cell values. We can immediately give 
the first constraint, which requires the initial configuration to be existential and 
stored also in the relation C^: 

Co{x,yi,...,yn) -> C^{x,yi,...,yn) . 
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To represent the computation tree of M, we encode pairs of subsequent con¬ 
figurations. In doing so, we not only store the identifiers of the configurations, 
but also their contents, in such a way that we can later check the correctness 
of the transitions using inclusion dependencies. We use different relations to 
recall the whether the current configuration is existential or universal and, in 
the latter case, whether the successor configuration is the first or the second 
one in the transition set (recall that the transition rules of M define exactly 
two successor configurations from each existential configuration). Formally, we 
introduce three hidden relations S'^, Si, and S' 2 , all of arity 2n + 2. We can 
easily enforce that the first n + 1 and the last n + 1 attributes in every tuple of 
S^, Si, and S '2 describe configurations in and C'^: 

S^{x,y,x',y') ^ C^ix,y) S^{x,y,x',y') C^{x',y') 

Siix,y,x',y') C'^{x,y) S'l{x,y,x',y') ^ C'^(x',y') 

S^{x,y,x',y') ^ C'^{x,y) S^{x,y,x',y') ^ C'^{x',y') . 

Similarly, we guarantee that every existential (resp., universal) configuration 
has one (resp., two) successor configuration(s) in S^ (resp., S^ and S^)- 

C^ix,y) 3 x'y'S^{x,y,x',y') 

C'^{x,y) 3 x'y'S'({x,y,x',y') 

C'^ix,y) 3 x'y'S 2 {x,y,x',y') . 

We now turn to explaining how we can enforce the correctness of the tran¬ 
sitions represented in the relations S^, S^, and S^- Compared to the proof 
of Theorem 7, the goal is simpler in this setting, as we can simply compare 
the values z-i,zo,z+i for the cells at positions i - l,i,i + 1 in a configuration 
with the value z' for the cell at position i in the successor configuration. We 
thus introduce new visible relations N^, Ni , and N 2 of arity 4. Each of these 
relations is initialized with the possible quadruples of cell values z_i, zq, z+i,z' 
that are allowed by transition function of M. For example, if the transition 
function specifies that, when M is in the universal control state q and reads 
the letter a, then M spawns two subcomputations where the first one begins 
by rewriting a with a', moving the head to the left, and switching to control 
state q', then we add to Ni all the tuples of the form (a_i, (a,g),a+i, o') or 
(a_ 2 ,a_i, (a,g), (a_i,g')), with a_ 2 ,a_i,a+i e S. Accordingly, we introduce the 
following IDs, for all I < i < n: 

S^(x,y,x',y') ^ N^{y^-l,yi,y^+l,y'^) 

Si{x,y,x',y') ^ N^{yi-i,yi,yt+i,y'i) 

S^{x,y,x',y') ^ N^{y,-i,y„y^+i,yl) . 

Furthermore, we constrain the values of the extremal cells to never change: 
S^{x,y,x',y') ^ E{yi,y'i) S^{x,y,x',y') E{yn,y'„) 

S\{x,y,x',y') ^ E{yi,y'i) S\{x,y,x',y') ^ E(y^,y'J 

S 2 ix,y,x',y') ^ E{yi,y'i) S^ {x,y,x',y') ^ E{yn,y'„) 
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where E is another visible binary relation interpreted by the singleton instance 

{( 1 , 1 )}- 

It remains to specify the query that checks that the Turing machine M 
reaches the rejecting state q^e] along some path of its computation tree. For 
this, we introduce a last visible relation Vrej that contains all cell values of the 
form (oj^rej), with a e E. The query that checks this property is 

Q = y 3 X y { C^{x,y) a V,s]{yi) ) ■ 

l<i<n 


Let V be the instance that captures the intended semantics of the visible 
relations V, Co, N^, iV^, E, and V^e], The proof that NQI(Q,C,S, V) = true 
iff M accepts (namely, has a computation tree where all paths visit the control 
state Qacc) goes along the same lines of the proof of Theorem 7. □ 

4.2 Existence problems 

Here we consider the complexity of the schema-level question, 3NQI((3,C,S). 
We first show that when the constraints are preserved under disjoint unions 
(e.g., connected frontier guarded TGDs), the existence of an NQI can be checked 
by considering a single “negative critical instance”, namely the empty visible 
instance 0 . This instance is easily seen to be realizable: the variant of the 
chase procedure that we introduced in Section 3.2 terminates immediately when 
initialized with the empty instance Eq = 0 and returns the singleton collection 
ChaseSvislC, S, 0) consisting of the empty S-instance satisfying C. 

Theorem 32. If the constraints C are preserved under disjoint unions of in¬ 
stances, then 3NQI((5,C,S) = true iff NQI((5,C,S,0) = true. 

Proof. It is immediate to see that NQI(Q,C, S,0) = true implies 3NQI((5,C, S) = 
true. We prove the converse implication by contraposition. 

Suppose that NQI((5,C, S,0) = false, namely, that there is an S-instance 
E satisfying C and Q and such that Visible(J^) = 0. We aim at proving that 
NQI((3,C,S, V) = false for all realizable visible instances V. Let V be such a 
realizable instance and let E' be a S-instance that satisfies C and such that 
Visible(J^') = V. We define the new instance E" as a disjoint union of E and 
E' . Since the constraints C are preserved under disjoint unions, E" satisfies C. 
Moreover, E" satisfies the query Q, by monotonicity. Since V = Visible(.F') = 
Visible(J^"), we have NQI(Q,C,S,V) = false. Finally, since V was chosen in an 
arbitrary way, this proves that 3NQI((5,C,S) = false. □ 

Using the “negative critical instance” result above and Theorem 22, we im¬ 
mediately see that 3NQI((5,C,S) is decidable in 2ExpTime for GNFO con¬ 
straints that are closed under disjoint unions, and in particular for connected 
frontier-guarded TGDs. Gombining with Gorollary 29 also gives an ExpTime 
bound for linear TGDs. In fact, we can improve this upper by observing that the 
NQI problem over the empty visible instance reduces to classical Open-World 
Query answering: 
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Proposition 33. For any Boolean CQ Q, NQI((5,C, S, 0) holds iff 
0WQ((5',C, CanonDB((3)) holds, where 

Q' = \/r6S„ R{x) 

and CanonDB((5) is the canonical database of the CQ Q. 

Proof. Suppose that NQI((3, C, S, 0) = true. This means that every S-instance 
that satisfies the constraints in C and has empty visible part, must violate the 
query Q. By contraposition, every S-instance that satisfies the constraints C and 
contains CanonDB((5) (i.e., satisfies Q), must contain some visible facts, and 
hence satisfy the UCQ Q'. This implies that OWQ((5',C, CanonDB((5)) =true. 

The proof that OWQ((5',C, CanonDB((3)) = true implies 3NQI((3,C,S,0) = 
true follows symmetric arguments. □ 

We know from previous results [BGOlO] that OWQ for Boolean UCQs and 
linear TGDs is in PSpace. From the above reduction, we immediately get 
that the problem NQI((5,C,S,0), and hence (by Theorem 32) the problem 
3NQI((5,C, S), for a set of linear TGDs is also in PSpace. 

Corollary 34. The problem 3NQI((5,C, S), as Q ranges over Boolean UCQ and 
C over sets of linear TCDs, is in PSpace. 

Matching lower bounds for 3NQI come by a converse reduction from Open- 
World Query answering. 

To prove this reduction, we first provide a characterization of the NQI prob¬ 
lem over the empty visible instance, which is based, like Proposition 12, on our 
chase procedure: 

Proposition 35. IfQ is a Boolean CQ and C is a set of TCDs and ECDs with¬ 
out constants over a schema S, then NQI((5,C, S, 0) = true iff either Q contains 
a visible atom, or it does not and in this case GhaseSvis(C, S, CanonDB((5)) = 0. 

Proof. We give first the proof when C consists only of TGDs. Suppose that 
Q does not contain visible atoms and GhaseSvis(C,S,CanonDB(Q)) contains 
an instance K. Because every instance in ChaseSvis(C, S, CanonDB((5)) satis¬ 
fies the constraints in C and the query Q, and it an empty visible part by 
Lemma 11, we conclude that NQI((5,C,S,0) = false. Conversely, suppose that 
NQI((5,C, S, 0) = false. This means that there is an S-instance T with no visible 
facts that satisfies the constraints in C and the query Q. Since !F \= Q, there 
is a homomorphism g from CanonDB((5) to T. Moreover, since Q contains no 
visible atoms, the two instances T and CanonDB((3) agree on the visible part. 
By Lemma 11, letting = CanonDB((5), we get the existence of an instance K 
in ChaseSvis(C, S, CanonDB((5)). 

In the presence of EGDs, we apply the extension of Lemma 11 to TGDs and 
EGDs, as discussed earlier. □ 

As in the positive case, the upper bounds are tight: 
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Theorem 36. 3NQI((5,C, S) is 2ExpTiME-/ior(i as Q ranges over Boolean CQs 
and C over sets of connected FGTGDs. 

Theorem 37. 3NQI((5,C,S) is PSPACE-hard as Q ranges over Boolean GQs 
and C over sets of linear TGDs. 

The first theorem is proven by reducing the open-world query answering 
problem to 3NQI, and then applying a prior 2ExpTiME-hardness result from 
Cali et al. [CGK13]. The PSpace lower bound is shown by a reduction from the 
implication problem for IDs, shown PSPACE-hard by Casanova et al. [CFP84]. 

To prove lower bounds for 3NQI, we hrst give a reduction from Open-World 
Query answering: 

Proposition 38. There is a polynomial time reduction from the Open- World 
Query answering problem over a set of connected FGTGDs without constants 
and a connected Boolean GQ to an 3NQI problem over a set of connected 
FGTGDs without constants and a Boolean CQ. 

Proof. Consider the Open-World Query answering problem over a schema S, a 
set C of constraints without constants and closed under disjoint union, a Boolean 
CQ Q, and a S-instance T. We reduce this problem to an 3NQI problem over 
a new schema S', a new set of constraints C', and a new Boolean CQ Q'. The 
schema S' is obtained from S by adding a relation Good of arity 0, which is 
assumed to be the only visible relation in S'. The set of constraints C is equal 
to C unioned with the constraint 


5'i(a:i) A ... A ^ Good 

where S'i(a:i), ..., Sm{xm) are the atoms in the CQ Q. The query Q' is defined 
as the canonical query of the instance T, obtained by replacing each value v 
with a variable y^ and by quantifying existentially over all these variables. Note 
that CanonDB((5') is isomorphic to the input instance T. 

Now, assume that the original constraints in C were connected FGTGDs 
and the GQ Q was also connected. By construction, the constraints in C' turn 
out to be also connected FGTGDs. In particular, the satisfiability of these 
constraints is preserved under disjoint unions, and hence from Theorem 32, 
3NQI((5',C',S') = true iff NQI((5^C',S',0) = true. Thus, it remains to show 
that NQI(Q',C',S',0) =true iff OWQ(Q,C, J") = true. 

By contraposition, suppose that 0\NQ{Q,C,J-) = false. This means that 
there is a S-instance T' that contains .F, satisfies the constraints in C, and 
violates the query Q. In particular, J-\ seen as an instance of the new schema 
S', without the visible fact Good, satishes the query Q' and the constraints in 
C (including the constraint that derives Good from the satishability of Q). The 
S'-instance T' thus witnesses the fact that NQI((5',C', S',0) = false. 

Gonversely, suppose that NQI((5',C', S', 0) = false. Recall that the con¬ 
straints in C' do not use constants and Q' contains no visible facts. We can 
thus apply Proposition 35 and derive GhaseSvis(C',S',CanonDB((5')) + 0. Note 
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that CanonDB((5') is clearly isomorphic to the original instance T. In par¬ 
ticular, the is an instance K in ChaseSvis(C', S', CanonDB((5')) that contains 
the original instance T^ satisfies the constraints in C', and does not contain 
the visible fact Good. From the latter property, we derive that K violates the 
query Q. Thus iG, seen as an instance of the schema S, witnesses the fact that 
OWQ(Q,C,J^) = false. □ 

We note that there are two variants of OWQ, corresponding to finite and 
infinite instances. However, by finite-controllability of FGTGDs, inherited from 
the finite model property of GNFO (see Theorem 1) these two variants agree. 
Hence we do not distinguish them. Similar remarks hold for other uses of OWQ 
within proofs in the paper. 

We are now ready to prove Theorem 36, namely, the 2ExpTiME-hardness 
of the problem 3NQI(Q,C,S), where Q ranges over Boolean GQs and C ranges 
over sets of connected FGTGDs. 

Proof of Theorem 36. Theorem 6.2 of Gali et al. [GGK13] shows 2ExpTime- 
hardness of open-world query answering for FGTGDs. An inspection of the 
proof shows that only connected FGTGDs are required. Thus, the theorem 
follows immediately from Proposition 38. □ 

We now turn towards proving Theorem 37, namely, the PSpace lower bound 
for 3NQI under linear TGDs. Recall that the reduction in Proposition 38 does 
not preserve smaller constraint classes, such as linear TGDs. We thus prove the 
theorem using a separate reduction. 

Proof of Theorem 37. We reduce from the implication problem for inclusion de¬ 
pendencies, which is known to be PSPACE-hard from Gasanova et al. [GFP84]. 
Consider a set of IDs C and an additional ID 5 = S^.{x^,) 3y T*(z*), where 
Xi,,y are sequences of pairwise distinct variables and z* is a sequence of vari¬ 
ables from Xi, and y. Note that we annotated with the subscript * the relations 
and variables in 5 in order to make it clear when refer later to these particular 
objects. 

We create a new schema S' that contains, for each relation R of arity k in 
the original schema S, a relation R' of arity 2k. We also add to S' a copy of 
each relation i? in S, without changing the arity. Furthermore, we add a 0-ary 
relation Good, which is the only visible relation of S'. For each ID in C of the 
form 

R{x) ^ 3y S{z) 

we introduce a corresponding ID in C of the form 

R'{x,x) 3y S'{z,x') 

where the variables in x' are distinct from the variables in x. We also add the 
constraints 


R{x) 

T'(z*,x') 
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R'{x,x) 

Good 


where the elements of z* are arranged as in the atom that appears on 

the right-hand side of the ID S. Note that the constraint that copies the content 
from R to R' and duplicates the attributes is not an ID, but is still a linear 
TGD. The query of our 3NQI problem is defined as 

Q' = 3x Si,{x) . 

Note that the constraints that we just defined are preserved under dis¬ 
joint unions. Thus, by Theorem 32, we know that ^NQI((5^C',S') = true iff 
NQI((3',C', S', 0 ) = true. Below, we prove that the latter holds iff the ID 6 is 
implied by the set of IDs in C. 

In one direction, suppose that the implication holds. From this, we can 
easily infer in the schema S' the following dependency: 

Si{x,x) 32 /T*'(z*,x) 

Consider now a full S'-instance T' with empty visible part. We show that the 
query Q' is not satisfied, namely, T' cannot contain a fact of the form Si,{x), If 
it did, then, by the copy of the constraints on the primed relations, this would 
yield the fact S''(a;*,x*), and hence, by the constraints, also the facts T*(z*,x*) 
and Good. This however would contradict the hypothesis that has empty 
visible part. 

In the other direction, suppose that the implication fails and consider a 
witness S-instance T that contains the fact S*(a:*) but not the corresponding 
T* fact. We create a full S'-instance T' with empty visible part where Q' holds, 
thus showing that 3NQI((5',C',S',0) = false. We first copy in T' the content of 
all relations R from T. In particular, T' contains the fact S'*(x*), but no T* 
fact The primed relations R! in T' are set to contain all and only the facts of 
the form i?'(a:,x*), where i?(x) is a fact in T. Finally, we set Good to be the 
empty relation in T'. Clearly, Q holds in T' and the visible part is the empty 
instance. It is also easy to verify that all the constraints in C' are satisfied by 

and this completes the proof. □ 

Note that the reduction above does not create a schema with IDs, but rather 
with general linear TCDs (variables can be repeated on the right). We do not 
know whether 3NQI((5,C,S) is PSPACE-hard even for constraints consisting of 
IDs. 

We can easily see that the connectedness requirement is critical for decid¬ 
ability: 

Theorem 39. The problem 3NQI(Q,C,S) is undecidable as Q ranges over 
Boolean CQs and C over sets of FGTGDs. 

Proof. We give a reduction from the model conservativity problem for £C 
TBoxes, which is shown undecidable in [LW07]. Intuitively, £C is a logic that 
defines FCTCDs over relations of arity 2, called “TBoxes”. Civen some TBoxes 
(fi and 4>2 over two schemas Si and S 2 , respectively, with Si £ S 2 , we say that 
(j )2 is a model conservative extension of (fi if every Si-instance V that satisfies 
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(pi can be extended to an S 2 -instance that satisfies <p 2 without changing the 
interpretation of the predicates in Si, that is, by only adding an interpretation 
for the relations that are in S 2 but not in Si. The model conservativity problem 
consists of deciding whether (p 2 is a model conservative extension of (pi. The 
proof in [LW07] shows that this problem is undecidable for both finite instances 
and arbitrary instances. 

We reduce the above problem to the complement of 3NQI((5,C, S), for suit¬ 
able Cj and S, as follows. Given some TBoxes pi and p 2 over the schemas 
Si £ S 2 , let S be the schema obtained from S 2 by adding a new predicate 
Good of arity 0 and by letting the visible part be Si (in particular, the rela¬ 
tion Good is hidden). Further let C = {(/)i,Good ^ P 2 }, where Good ^ p 2 is 
shorthand for the collection of FGTGDs obtained by adding Good as a con¬ 
junct to the left-hand side of each constraint of p 2 (note that this makes the 
constraints unconnected). Finally, consider the query Q = Good. We have that 
3NQI((3,C,S) = true iff there is an Si-instance V satisfying pi, none of whose 
S 2 -expansions satisfies p 2 . □ 

4.3 Summary for Negative Query Implication 

A summary of results on negative implication is below. We notice that the 
decidable cases are orthogonal to those for positive implications. Note also that 
unlike in the positive cases, we have tractable cases for data complexity. 



NQI Data 

NQI Combined 

3NQI 

Linear 

TGD 

PTiME-cmp 

Cor 29/Prop 30 

ExpTiME-cmp 

Cor 29/Thm 31 

PSPACE-cmp 

Cor. 34/Thm 37 

Conn. Disj. 
FGTGD 

ExpTiME-cmp 
Thm 22/Thm 23 

2ExpTlME-cmp 
Thm 22/Thm 23 

2ExpTlME-cmp 
Thm 32/Thm 36 

FGTGD 
& GNFO 

ExpTiME-cmp 
Thm 22/Thm 23 

2ExpTlME-cmp 
Thm 22/Thm 23 

undecidable 

Thm 39 


5 Extensions and special cases 

We present some results concerning natural extensions of the framework. 

First note, that throughout this work we have restricted to queries given by 
Boolean UCQs. The natural extension of the notion of query implication for 
non-Boolean queries is to consider disclosure of information concerning mem¬ 
bership of any visible tuple in the query output. E.g. PQI(Q,C,S,V) would 
hold if for some tuple t in the active domain of V, t e Q{T) for any instance T 
of S satisfying the constraints and having visible part V. We show that all of 
our results carry over to the non-hoolean case. 

The natural extension of the notion of query implication for non-Boolean 
queries is to consider disclosure of information concerning membership of any 
visible tuple in the query output. E.g. PQI(Q,C, S, V) would hold if for some 
tuple i in the active domain of V, t e Q{tF) for any instance .F of S satisfying 
the constraints and having visible part V. Since the lower-bounds for Boolean 
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problems are clearly inherited by the non-Boolean ones, we focus on whether 
the upper bounds carry over. 

All the complexity upper bounds for the instance-level problem carry over 
straightforwardly using the simple approach of substituting in each potential 
output a tuple from V and utilizing the prior algorithms on the resulting Boolean 
queries. The complexity for each substitution preserves the upper bounds since 
they hold in the presence of constants, and the iteration over tuples can be ab¬ 
sorbed in the complexity classes given in our upper bounds: for data complexity 
the iteration is polynomial, while for combined complexity the number of tuples 
can be exponential, but our bounds are at least exponential. Further, GFP- 
Datalog definability for negative implications also extends straightforwardly to 
the non-Boolean case: Theorem 27 extends with the same statement and proof, 
while the argument in Theorem 28 is easily extended to show that there is a 
GFP-Datalog program that returns the complement of NQI((5,C, S) within the 
active domain. 

The complexity results for 3PQI also generalize to the non-Boolean case: 
we can revise Theorem 10 to state 3PQI((5,C,S) = true iff there is a positive 
query implication for the tuple (a,..., a) and the instance V{a}- Foi' 3NQI, we 
can extend Theorem 32 to show that for constraints preserved under disjoint 
union, if there is a positive query implication involving some visible instance V 
and a tuple t, then there is one involving the empty instance and some tuple 
t. From this it follows that the complexity bounds for 3NQI carry over to the 
non-Boolean case. 

Beyond conjunctive queries. So far we have considered only the case where 
Q is a UGQ. It is natural to extend the query language even further, to Boolean 
combinations of Boolean conjunctive queries (BGGQs). We note that the prob¬ 
lem PQI(Q,C,S, V), as Q ranges over BGGQs, subsumes both PQI(Q,C,S, V) 
and NQI((5,C,S, V) for Q a UGQ. Thus all lower bounds for either of these 
two problems are inherited by the BCCQ problem. The corresponding instance 
level problems are still decidable. Indeed, this holds even when Q is a GNFO 
sentence - we can just apply using the same translation to GNFO satisfiability 
applied in Theorems 4 and 22. However, for the schema-level problems 3PQI 
and 3NQI we immediately run into problems: 

Theorem 40. The problem 3PQI((5,C,S) for a Boolean combination Q of 
CQs is undecidable, even when the constraints are IDs. The same holds for 
3NQI(Q,C,S). 

Proof As in the previous undecidability results, we reduce a tiling problem with 
tiles T, initial tile Iq e T and horizontal and vertical constraints H^V ^ T x T 
to the problem 3PQI(Q,C, S). Again, for convenience we deal with the infinite 
variant of the problem. The idea will be that the visible instance witnessing 
3PQI represents the tiling, and invisible instances represent challenges to the 
correctness of the tiling. 

We model the infinite grid to be tiled by visible relations Eh and Ey, and 
the tiling function by a collection of unary visible relations Ut, for all tiles t eT. 
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The invisible relations represent markings of the grid for possible errors. 
There are several kinds of challenges. We focus on the horizontal consistency 
challenge, which selects two nodes in the Eh relation, to challenge whether 
the nodes satisfy the horizontal constraint. Formally, the challenge is captured 
by a binary invisible predicate HorChallenge(a:,y), with an associated integrity 
constraint 

HorChallenge(x,j/) ^ Enix^y) . 

The query Q will be satisfied only when the following negated CQs hold, for all 
pairs / EE 


-i3 X y HorChallenge(x,2/) a Ut{x) a Ut’{y) ■ 

Note that this can only happen if the relation HorChallenge has selected two 
horizontally adjacent nodes whose tiles violate the horizontal constraints. The 
vertical constraints are enforced in a similar way using an invisible relation 
VertChallenge and another negated CQ. 

Recall that in the infinite grid, we have unique vertical and horizontal succes¬ 
sors of each node, and the horizontal and vertical successor functions commute. 
Thus far we have not enforced that Ey and Eh have this property. We will use 
additional hidden relations and IDs to enforce that every element is related to 
at least one other via Eh and Ey. 

We first show how to enforce that every element has at most one hori¬ 
zontal successor (“functionality challenge”). We introduce a hidden relation 
HorFuncChallenge(x, j/,y') and constraints 

HorFuncChallenge(x,y, j/') Enix^y) 
HorFuncChallenge(x,y,j/') ^ Ey{x,y') . 

We also add to the query Q the conjunct: 

[ ^3 X y y' HorFuncChallenge(x,y, j/') ) v ( 3 x y HorFuncChallenge(x, y, y) ) . 

We claim that if there is a visible instance witnessing 3PQI, then Eh is func¬ 
tional. Indeed, if Eh were not functional in the visible instance, then we could 
choose a node x with two distinct Eh-succbssots y and y', add only the tuple 
{x,y,y') to HorFuncChallenge, and obtain a full instance that satisfies the con¬ 
straints but not the query Q. Conversely, suppose that Eh is functional in a 
visible instance V, and consider any full instance T that satisfies the constraints 
and agrees with V on the visible part. If there are no tuples in HorFuncChallenge, 
the conjunct above is clearly satisfied by its first disjunct. If there is some tuple 
{x,y,y') in HorFuncChallenge, then by the constraints, we must have EH{x,y) 
and EH{x,y'), and hence, by functionality, y = y'. In this case, the conjunct 
above holds via the second disjunct. The functionality of the vertical relation 
Ey is enforced in an analogous way. 


52 



Commutativity of Eh and Ey can be also enforced using a similar technique. 
We add a hidden relation ConfChallenge(a:, j/, z, m, w) with constraints: 


ConfChallenge(a:, y, z, u, v) 
ConfChallenge(a:, y, z, u, v) 
ConfChallenge(a:, y, z, u, v) 
ConfChallenge(a:, y, z, u, v) 


Enix^y) 

Ev{y,u) 

Ev{x,z) 

Eh{z,v) 


A potential tuple in ConfChallenge(a:, y, z, u, w) represents the join of a triple of 
nodes moving first horizontally and then vertically from x (i.e., x,y^u) and a 
triple going first vertically and then horizontally from x (i.e., x,z,v). For the 
relations to commute, we must satisfy the query 

[-•3xy zuv ConfChallenge(x, y,z,u,v)) v (^3xy zu ConfChallenge(a::, y, z, u, u) ) 

in the full instance. Thus, we add the above conjunct to Q. 

Putting the various components of Q for different challenges together as a 
Boolean combination of CQ, completes the proof of the theorem. □ 

The case of conjunctive query views. As mentioned earlier, the database 
community has studied the PQI problem in the case where the constraints consist 
exactly of CQ-view definitions defining each visible relation in terms of invisible 
relations. Formally, a CQ-view based scenario consists of a schema S = S„ u S/j, 
namely, the union of a schema for the visible relations and a schema for the 
hidden relations, and a set of constraints C between visible and hidden relations 
that must be of a particular form. For each visible relation R e Sy, C must 
contain two dependencies of the form 

R{x) 3y (j)uix,y) 

(l}R{x,y) R{x) 


where 4>r is a conjunction of atoms over the hidden schema Su-, Furthermore, all 
constraints in C must be of the above forms. Note that this CQ-view scenario 
is incomparable in expressiveness to GNFO constraints. 

The instance-level problems are still well-behaved, because given a visible 
instance V, the constraints can be rewritten as Ci AC 2 , where Ci consists of TGDs 
from the view relations to the base relations, and C 2 consists of disjunctive linear 
EGDs from the base relations to the various possible tuples in the view relations. 
Thus the “disjunctive chase” of V with these constraints will terminate after a 
finite number of rounds. From this we can directly argue that a counterexample 
superinstance for either PQI and NQI must be of polynomial size. 

The decidability of the 3PQI problem follows immediately from these ob¬ 
servations and Theorem 10, which applies to constraints capturing CQ-view 
definitions. In contrast, for the 3NQI problem we prove that 

Theorem 41. The 3NQI problem under constraints given as CQ-view defini¬ 
tions is undecidable. 
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Proof. We given a reduction from a tiling problem that is specified by a set of 
tiles T, an initial tile 6 T, and horizontal and vertical constraints H,V £ TxT. 

As before, we will deal for simplicity with the infinite variant, thus considering 
the problem of tiling the infinite grid N x N. 

As usual, we will have visible relations Eh and Ey representing the hori¬ 
zontal and vertical edges of the grid. As these relations must be associated with 
CQ-view definitions, we add hidden copies E'^^ and Ey of them and enforce the 
trivial dependencies: 

EH{x,y) ^ E'fj{x,y) 

Ev{x,y) ^ Ey(x,y) . 

Similarly, each node of the grid has to be associated with a tile in T, and this 
will be represented by a tuple of visible unary relations Ut and hidden copies 
U/, and constrain them with the dependencies Ut{x) for all t € T. 

As in earlier undecidability results, such as Theorem 40, the first goal is to 
ensure that for each node, there exists at most one predecessor and at most one 
successor for the relations Eh and Ey. We explain how to ensure this for the 
successor case and the relation Eh , but similar constructions work for the other 
cases. We introduce a hidden relation HorFuncChallenge of arity 4, and a visible 
relation ErrHorFun of arity 3 with the associated CQ-view dehnition 

ErrFlorFun(a;,?/,a;') HorFuncChallenge(a;,y,a;',y) . 

Our query Q will contain as a conjunct the following UCQ: 

^HorFuncChallenge = ( 3 a; y y' ErrHorFuti(a:, y, y') ) v 

( 3 a; y y' HorFuncChallenge(a;,y,x,y') a EH{x,y) a EH{x,y') ). 

We explain how the subquery QHorFuncChaiienge enforces that every element has 
at most one successor in the relation Eh- 

Suppose that 3NQI(QHorFuncChaiienge,C, S) = true, namely, that there exists 
an S„-instance V such that NQI(QHorFuncChaiienge,C,S, V) = true. The visible re¬ 
lation ErrFlorFun must be empty in V, as otherwise the query QHorFuncChaiienge 
would be satisfied in every full instance that agrees with V on the visible 
part (note that V is clearly realizable). Moreover, as ErrHorFun is empty in 
V, every full instance that satishes the constraints and agrees with V does 
not contain a fact of the form HorFuncChallenge(x,y, x', y). Now, suppose, 
by way of contradiction, that there is an element x with two distinct Eh- 
successors y and y' . We can construct a full instance that extends V with 
the single fact HorFuncChallenge(x,y,x,y'). This full instance satisfies all 
the constraints in C and also the query QuorFuncChaiienge, thus contradicting 
hlQI (QHorFuncChallenge 5 tl,S,V) — true. 

For the converse direction, we aim at proving that there is a negative query 
implication on QHorFuncChaiienge for those instances that encode valid tilings and 
are realizable. More precisely, we consider a visible instance V in which the re¬ 
lation Eh is a function and the relation ErrHorFun is empty (note that the latter 
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condition on ErrHorFun is safe, in the sense that the considered instance V could 
be obtained from a valid tiling and, being realizable, could be used to witness a 
negative query implication). We claim that NQI(QHorFuncChaiienge7C,S, V) = true. 
Consider an arbitrary full instance T that agrees with V on the visible part 
and satisfies the constraints in C, and suppose by way of contradiction that 
QHorFuncChaiienge holds On T. Then, T would Contain the following facts, for a 
triple of nodes x,y,y': HorFuncChallenge(a;,j/,a;,y'), EH{x,y), Ev{x,y'). On 
the other hand, T cannot contain the fact HorFuncChallenge(x, j/,x', t/), as oth¬ 
erwise this would imply the presence of the visible fact ErrFlorFun(x, t/, x'). From 
this we conclude that y + y', which contradicts the functionality of Eh- 

Very similar constructions and arguments can be used to enforce single suc¬ 
cessors in Ev, single predecessors in Eh and Ey, as well as confluence of Eh 
and Ev- 

We now explain how we enforce the existential properties of the grid, such 
as Eh being non-empty. We introduce two binary relations FlorEmptyError and 
FlorEmptyFlidden Error, where the former is visible and the latter is hidden, and 
constraint them via the CQ-view definition 

FlorEmptyError 3xy(^EH{x,y) a HorEmptyHiddenError) . 

We add as a conjunct of our query the following UCQ: 

QHorEmptyError = HorEmptyError V HorEmptyHiddenError . 

Below, we show how this enforces non-emptiness of Eh- 

Suppose that V is an S^-instance such that NQI((5HorEmptyError7C, S, V) = true. 
We show that in this case the relation Eh is non-empty. First, note that the fact 
HorEmptyError must not appear in V, since otherwise all full instances extending 
V would satisfy ^HorEmptyError (as V is realizable, there is at least one such full 
instance). If Eh were empty, we could set HorEmptyHiddenError to non-empty 
and thus get a contradiction of NQI((5HorEmptyError7C,S, V) = true. 

For the converse direction, we consider a visible instance V in which the 
relation Eh is non-empty and HorEmptyError is empty (again, such an instance 
can be obtained from a valid tiling of the infinite grid and thus can be used to 
witness a negative query implication). In any full instance that agrees with V 
on the visible part, HorEmptyHiddenError must agree with HorEmptyError, and 
hence must be empty. This implies that the query Q HorEmptyError is violated, 
whence NQI(t^HorEmptyError 7 i-' 7 ^ 7 i^) — true. 

Besides requiring that Eh and Ey are non-empty, we must also guarantee 
that for every pair {x,y) e Eh (resp., {x,y) e Ey), there is a pair {y,z) e Ey 
(resp., {y,z) e Eh)- Note that once we have performed this, functionality and 
confluence will ensure that Eh and Ey correctly encode the horizontal and 
vertical edges of the grid. We explain how to enforce that every pair {x,y) e Eh 
has a successor pair (y, z) e Ey - a similar construction can be given for the 
symmetric property. We add to our schema another visible relation HorSuccError 
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of arity 0, and a hidden relation HorSuccHiddenError of arity 1. The associated 
CQ-view definition is 

HorSuccError ^ 3xy z EH{x,y) a HorSuccHiddenError(j/) a Ev{y,z) . 
Moreover, we add as a conjnnct of our query the following UCQ: 

^HorSuccError = HorSuccError V (^3 X y EH{x,y) a HorSuccHiddenError(j/) ) . 

We show how this enforces the desired property. 

Suppose that there is a visible instance V such that 
NQI(QHorSuccError,C,S, V) = true. First, observe that the visible relation 
HorSuccError must be empty, as otherwise all extensions of V would sat¬ 
isfy ^HorSuccError- Now, suppose, by way of contradiction, that there is a pair 
(x, y) e Eh that has no successor pair {y, z) e Ey. In this case, we can construct 
a full instance that extends V with the hidden fact HorLabelHiddenError(t/). 

This full instance has V as visible part and satisfies the constraints and the query 
^HorSuccError• As this coutradicts the hypothesis NQI^(^HQi'5LiccError7^r^rT) — true, 
we conclude that for every pair (x, y) e Eh, there is a successor pair (y, z) e Ey. 

Conversely, consider a visible instance V that represents a correct encoding 
of the infinite grid and where the visible relation HorSuccError is empty. In any 
full instance that agrees with V on the visible part, HorSuccError must be the 
same as 3 x y z EH{x,y) a HorSuccHiddenError(i/) a Ey{y,z). In particular, 
because every node has both a successor in Eh and a successor in Ey, this 
implies that the hidden relation HorSuccHiddenError cannot contain the node y, 
for any pair {x,y) e Eh- Hence the query QnorSuccError is necessarily violated, 
and this proves that NQI((5HorSuccError,C,S, V) = true. 

Now that enforced a grid-like structure on the relations Eh and Ey, we con¬ 
sider the relations Ut that encode a candidate tiling function. Using similar tech¬ 
niques, we can ensure that every node of the grid has an associated tile. More 
precisely, we enforce that, for every pair {x,y) e Eh, the element x must appear 
in also appears in Ut, for some tile t eT We add a visible relation HorLabelErrort 
of arity 0 for each tile t eT and a hidden relation HorLabel Hidden Error of arity 
I. The associated CQ-view definitions are of the form 

HorLabelErrort ^ x y EH{x,y) a HorLabelHiddenError(a;) a Ut{x) . 

We add as conjunct of our query the following UCQ: 

QHorLabeiError = V 3 X y (HorLabelError^ (x, y)) V (£;//(x,y)AHorLabelHiddenError(x)) . 

t€T 

We prove that the above definitions enforce that all nodes that appearing on 
the first column of the relation Eh have at least one associated tile. 

Consider a visible instance V such that NQI((5HorLabeiError, C, S, V) = true. For 
each tile t, the visible relation HorLabelErrort must be empty, as otherwise all 
extensions of V would satisfy QHorLabeiError- Suppose, by way of contradiction, 
that there is a node x that appears on the first column of the visible relation 
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Eh^ but does not appear in any relation Ut, with t e T. We can construct 
a full instance where the relation HorLabelHiddenError contains the element 
X. This instance would then satisfy the query QHorLabeiError, thus contradict¬ 
ing hlQI (t^HorLabelErrori S 1 V) — true. 

For the converse, consider a visible instance V in which the relation Eh is 
non-empty (as enforced in the previous steps) and, for all pairs {x,y) e Eh, 
there is a tile t eT such that x eUt- Furthermore, assume that all the relations 
HorLabelErrors, with t e T, in this visible instance are empty. Note that such 
an instance V is realizable and hence can be obtained from a valid tiling (if 
there is any) and used as a witness of a negative query implication. In every 
full instance that agrees with V and satisfies the constraints, HorLabelErrort 
must be the same as 3 x y HorLabelHiddenError(a;) a EH{x,y) a Ut{x)). In 
particular, because every node is associated with some tile, this implies that 
the hidden relation HorLabelHiddenError cannot contain the node x, for any pair 
(x, y) € Eh- Hence the query QHorLabeiError is necessarily violated, and this proves 

that NQI (t^HorLabelError? S, V) — true. 

We also need to guarantee that each node has at most one associated tile. 
This property can be easily enforced by the subquery 

ii^TwoLabelsError ~ \/ 3x Ut(^X^ A Ufr(^x') . 

Finally, we enforce that the encoded tiling function respects the horizontal and 
vertical constraints using the following UCQ: 

ii^ConstraintError ~ {t ^ ^ ^ Ut(^X^ A ^ 

( 3 y Ev{x,y) AUt{x) AUt'iy) ) • 

Summing up, if we let Q be the conjunction of all previous queries, we know 
that 3NQI((5,C, S) = true if and only if there exists a valid tiling of the infinite 
grid N X N. □ 


6 Conclusions 

This work gives a detailed examination of disclosure of query results from 
schemas with hidden relations, where disclosure arises from the presence of con¬ 
straints in expressive integrity constraint languages. In future work we will look 
at mechanisms for “restricted access” that are finer-grained than just exposing 
the full contents of a subset of the schema relations, such as language-based 
restrictions (user can pose queries within a certain language). 
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